Safety when working with PostgreSQL

the
the list of rules
Almost each of the items below is a sad story, full of pain and prevoznemogaya. And the word "pain!" marked items produced stories at the recollection of which I still shudder at night.
Versionservice the database schema
The database schema is code that you have written. It should be in the version control system and versionrelease with the rest of the project. In the case of PostgreSQL I for these purposes like Pyrseas. He makes a diagram with all the PostgreSQL-specific objects in a yaml file, which is versioned. With this file it is convenient to work in the branches and merge, in contrast to pure SQL. The final step in the yaml file is compared to the database schema and automatically generates a migration in SQL.
Pain! Never apply changes immediately on military base
Even if the change is simple, incredibly urgent and very desirable. First, you need to apply it to the database developers, to zammitti to the branch, the changes apply on the base of the trunk (identical to the combat base). And then, when all is well in the trunk, use the combat base. It's long, paranoid but saves from many problems.
Pain! Before you can write a delete or update, write where
And before to run code, exhale, count to three and make sure that you are in a session database, you need. About truncate statement I do not say anything without three "our father" don't even start, Amen!
UPD. koropovskiy: useful to set autocommit off set for the current session.
tgz: or before every update, and delete statements write begin.
Test Driven Development
Always first write tests, and then create the database objects. We are talking about any object: schema, tables, functions, types, extensions — no exceptions! At first it seems hard, but later you will say to yourself thank you. Even at the initial creation of the schema is easy to miss something. While refactoring tables after six months, only your written test will save you from a sudden shot in the leg in some functions. In the case of PostgreSQL there is a wonderful extension pgTAP. I recommend for each schema to create the schema "имя_схемы_tap" in which to write functions for testing. And then just run the tests through pg_prove.
don't forget to configure PITR
I'm afraid to play the role of Captain, but any database should be configured backup. Though it is desirable to be able to recover the database to any point in time. This is necessary not only for recovery, but also provides many interesting features for developers to work within certain time slices of the base. In PostgreSQL for this is barman.
data Consistency
Inconsistent data in the database never led to anything good. Even a small amount can easily turn the entire base into the garbage. Therefore, never neglect the normalization and constraints like foreign keys and checks. Use denormalized form (such as jsonb) just to make sure that it is impossible to implement the scheme in a normalized manner with an acceptable level of complexity and performance — denormalized views can potentially lead to inconsistent data. For all the arguments of the proponents of denormalization answer that normalization was not invented just so silent with a meaningful look.
Create the foreign keys deferrable initially deferred
In this case, you defer checking constraints at the end of the transaction, allowing impunity to inconsistency in its implementation (but in the end, all is consistent or will fail). Moreover, changing the flag inside of a transaction for immediate, you can force to check the limits at the right time of the transaction.
UPD. The comments indicate that deferrable — controversial practice, which simplifies a number of tasks of import, but it complicates the debugging process is inside a transaction and it is bad practice to start. Even though I stubbornly inclined to think that it is better to have deferrable keys than to have them consider an alternative view of the question.
do Not use the public schema
It's the official diagram for the functions of the extensions. For their needs, create a separate schema. Treat them as modules and create a new schema for each logically separate set of entities.
schema API
For functions that are invoked on the side of the application, you can create a separate schema "api_v_номер_версии". This will allow precise control of where the functions are interfaces to your database. For the names of the functions in the schema you can use the template "сущность_get/post/patch/delete_аргументы".
Triggers for auditing
Best fit triggers to audit activity. So I recommend to create a generic trigger function to record all activity random table. For this you need to get data about the structure of the target table from the information_schema, and to understand old or new row will be inserted depending on the actions you take. Due to this solution the code is love and predstavim supported.
If you plan to use triggers to count the register of savings, be careful in the logic — one mistake and you can get inconsistenties data. Rumor has it, this is a very dangerous kung fu.
Pain! Import data into new schema
The worst, but regularly occurring event in the life of a database developer. In PostgreSQL are very helpful FDW, the more they are well circulated in 9.6 (unless their developers will attend to, then FDW can build a plan on the remote side). By the way, there is such a convenient design "import foreign schema", which saves us from writing a wrapper over a bunch of tables. Also it is good practice to have a set of functions that preserve the set of SQL commands to remove and restore the existing base of foreign and primary keys. Import recommend to carry out first by writing a set of view data is identical in structure as the target tables. And of them to make a paste using the copy (not insert!). The entire sequence of SQL commands it is better to keep in a separate versioned file and run them through psql with the key -1 (in a single transaction). By the way, import is the only case when in PostgreSQL you can turn off fsync, after making a backup and fingers crossed.
do Not write in SQL:1999
No, really, since then a lot has happened: a whole generation released from school mobile phones of the bricks turned into a supercomputer by the standards of 1999. In General, it is not necessary to write as he wrote, our fathers. Use "with", it's code is cleaner and can be read from the top down, and not to zigzag among the blocks of s joins. By the way, if join is done on fields with the same name, it is more concise to use "using", not "on". And of course, never use the combat code offset. And yet there is such a beautiful thing "lateral join", which is often overlooked — and in this moment in the world of a sad kitten.
UPD. Using "with" do not forget that the result creates CTE, which eats away the memory and does not support indexes at query to it. So is used too often and out of place "with" may adversely affect query performance. So don't forget to analyze the request through the scheduler. "with" especially good when you need to get the table that will in different ways be used in multiple parts of the query below. And remember, "with a" radically improves the readability of the query and in each new version of PostgreSQL is running everything efficiently. Other things being equal — prefer this design.
Temporary tables
If you can write a query without temp tables — don't hesitate to write! CTE is typically generated by a design "with" is an acceptable alternative. The fact that PostgreSQL for each temporary table creates a temporary file... and Yes, another sad kitty on the planet.
Pain! The worst anti-pattern in SQL
Never use design view
the
select myfunc() from table;
The execution time of this query increases linearly with the number of rows. Such a query can always be rewritten into something without a function to apply to each row, and win a couple of orders of magnitude in execution speed.
the secret query
If your request is slow on the test machine, in production it work faster will not. Here is the best analogy about the road with cars. The test computer is a road with one row. Production server — the road with ten ranks. Ten ranks in rush hour you will pass a lot more cars without traffic jams than one lane. But if your car is an old bucket, like a Ferrari it will not go, how many bands don't give her.
Use the index, Luke!
From how correctly you will and will use, depends, will the query be executed tenths of seconds or minutes. I recommend to read website Marcus Wynand on device b-tree indexes — this is the best public explanation for balanced trees, which I saw on the Internet. And the book he, too steep, Yes.
group by clause, or window function?
There is, of course, the window function maybe more. But sometimes aggregation can be calculated, and so and so. In such cases I follow the rule: if the aggregation is considered for covering indexes — only group by. If covering indexes are not, then you can try window function.
set_config
set_config can be used not only for exposure settings for postgresql.conf as part of a transaction, but also transfer in the transaction user-defined variable (if it is pre-define in postgresql.conf). Using these variables in the transaction can be very interesting to influence the behavior of called functions.
FTS and trigram
They are wonderful! They give us full-text and fuzzy search, while maintaining all the power of SQL. Just don't forget to use them.
Call your own exceptions
Often, in a large project have to cause a lot of exceptions with their own codes and messages. To avoid confusion, there is an option to create exceptions for private type "code exception", as well as functions to call them (the wrapper on "raise"), add and remove. But if you have covered all of your database objects the tests, you will not be able to accidentally remove the exception code that is already somewhere used.
Much paranoia is never enough
A good practice is to not forget to configure the ACL on the table, and the function run from "security definer". When functions work only on reading, Feng Shui requires to put their flag "stable".
Pain! The icing on the cake
UPD. You can never redirect the user application via the server to the database, mutually unequivocally translating the application user in the user database. Even if you think that it is possible to configure the database security for users and groups regular means PostreSQL, never do that, it's a trap! In this scheme you cannot use connection pooling, and each connected user application to capture the resource-intensive connection to the database. Database keep hundreds of connections, and servers — thousands and it is for this reason that the applications use load balancers and connection pools. And during the broadcast one-to-one each user in the database with the growth of load will break the circuit and the rewrite.
Комментарии
Отправить комментарий