How to work in PostgreSQL security_barrier submission

You may have noticed that in PostgreSQL 9.2 support was added for security_barrier views. I looked at this code with a view to adding support for automatic updates for them as part of the developing work on the protection level lines for project AXLE, and I thought I'd try to explain how they work.

Robert's explained what is the benefit of such representations and what they protect (this is also discussed in "What's new in PostgreSQL 9.2"). Now I would like to go to, they work and to discuss how security_barrier representation interact with automatically updated views.

common presentation


Simple normal view expanded in macroom a subquery, which usually is then optimized by passing it to the predicate and add it to the conditions contained of the request. This may become more clear with an example. Given a table:

the
CREATE TABLE t AS SELECT n, 'secret'||n AS secret FROM generate_series(1,20) n;

and performance:

the
CREATE VIEW t_odd AS SELECT n, secret FROM t WHERE n % 2 = 1;

the query:

the
SELECT * FROM t_odd WHERE n < 4

will be converted within request handler in the following form:

the
SELECT * FROM (SELECT * FROM t WHERE n % 2 = 1) t_odd WHERE n < 4

which then the optimizer will transform the query running at once, making a subquery and the WHERE clause in the outer query:

the
SELECT * FROM t t_odd WHERE (n % 2 = 1) AND (n < 4)

You will not be able to see instant queries directly and they never exist in real SQL, but you can see this process, including debug_print_parse = on, debug_print_rewritten = on and debug_print_plan = on in postgresql.conf. I will not reproduce here the trees parsing and planning, as they are quite bulky and they are easy to generate, based on the examples above.

the Problem with using views for security


You may think that giving someone access to a view without granting access to the table itself, will not allow them to see the even lines. Actually it looks like true:

the
regress= > SELECT * FROM t_odd WHERE n < 4;
n | secret 
---+---------
1 | secret1
3 | secret3's
(2 rows)

but when you look at the plan, you will be able to see the potential problem:

the
regress= > EXPLAIN SELECT * FROM t_odd WHERE n < 4;
QUERY PLAN 
---------------------------------------------------
Seq Scan on t (cost=0.00..31.53 rows=2 width=36)
Filter: ((n < 4) AND ((n % 2) = 1))
(2 rows)

The subquery performance has been optimized and its determinants were made in the outer request.

In SQL, and and OR are not ordered. The optimizer/executor has complete freedom in the choice of branches to start, which they consider to be more rapid in terms of the provision of the response and may allow them to avoid running the other branches. I.e. if the scheduler believes that the n < 4 much faster than n % 2, he will launch his first. Looks harmless enough, right? Try:

the
regress= > CREATE OR REPLACE FUNCTION f_leak(text) RETURNS boolean AS $$
BEGIN
RAISE NOTICE 'Secret is: %',$1;
RETURN true;
END;
$$ COST 1 LANGUAGE plpgsql;

regress= > SELECT * FROM t_odd WHERE f_leak(secret) AND n < 4;
NOTICE: Secret is: secret1
NOTICE: Secret is: secret2
NOTICE: Secret is: secret3's
NOTICE: Secret is: secret4
NOTICE: Secret is: secret5
NOTICE: Secret is: secret6
NOTICE: Secret is: secret7
NOTICE: Secret is: secret8
NOTICE: Secret is: secret9
NOTICE: Secret is: secret10
NOTICE: Secret is: secret11
NOTICE: Secret is: secret12
NOTICE: Secret is: secret13
NOTICE: Secret is: secret14
NOTICE: Secret is: secret15
NOTICE: Secret is: secret16
NOTICE: Secret is: secret17
NOTICE: Secret is: secret18
NOTICE: Secret is: secret19
NOTICE: Secret is: secret20
n | secret 
---+---------
1 | secret1
3 | secret3's
(2 rows)

regress= > EXPLAIN SELECT * FROM t_odd WHERE f_leak(secret) AND n < 4;
QUERY PLAN 
----------------------------------------------------------
Seq Scan on t (cost=0.00..34.60 rows=1 width=36)
Filter: (f_leak(secret) AND (n < 4) AND ((n % 2) = 1))
(2 rows)

Oops! As You can see, a function with predicate, filled in by the user, it was considered cheaper to run than other tests, so she missed all the lines before the predicate representation eliminated unsuitable. The malicious function can use the same trick to copy a string.

Views security_barrier


Submission security_barrier fix it by forcing conditions to occur in the first place, before any terms created by the user are applied. Instead of having to expand the view and to make any conditions in the outer query, they replace the reference to the submission of the subquery. This subquery has the security_barrier flag is affixed to the entire range of entering it into the table, which tells the optimizer that it should not touch the subquery, or make conditions of him, as he would do normally.

Thus the representation with a protective barrier:

the
CREATE VIEW t_odd_sb WITH (security_barrier) AS SELECT n, secret FROM t WHERE n % 2 = 1;

we will get:

the
regress= > SELECT * FROM t_odd_sb WHERE f_leak(secret) AND n < 4;
NOTICE: Secret is: secret1
NOTICE: Secret is: secret3's
n | secret 
---+---------
1 | secret1
3 | secret3's
(2 rows)

regress= > EXPLAIN SELECT * FROM t_odd_sb WHERE f_leak(secret) AND n < 4;
QUERY PLAN 
---------------------------------------------------------------
Subquery Scan on t_odd_sb (cost=0.00..31.55 rows=1 width=36)
Filter: f_leak(t_odd_sb.secret)
-> Seq Scan on t (cost=0.00..31.53 rows=2 width=36)
Filter: ((n < 4) AND ((n % 2) = 1))
(4 rows)

The query execution plan should tell You what happens, though he does not show the attribute of a protective barrier in the output of explain. A nested subquery is forced to scan t with the conditions of the subquery of a view, then the received data is met, the user-written function.

But. Wait a second. Why applied by the user, the predicate n < 4 also appears in the subquery? Isn't this a potential security hole? If n < 4 is omitted, then why not f_leak(secret)?

LEAKPROOF operators and functions


The explanation for this is that the operator < is marked as LEAKPROOF. This attribute indicates that the operator, or a function of a power of attorney will not allow information leakage, respectively can be safely applied to security_barrier ideas. FOR obvious reasons, You will not be able to set the attribute LEAKPROOF as a normal user:

the
regress= > ALTER FUNCTION f_leak(text) LEAKPROOF;
ERROR: only superuser can define a leakproof function

superusers can do whatever they want and they do not need to resort to tricks with functions of information leakage in order to pass a barrier of ideas.

Why can't You update the security_barrier views


Normal views in PostgreSQL 9.3 automatically updating, but security_barrier views do not imply "simplicity." This is because updating the submission relies on the ability to remove the subquery of a view, making the update in the normal update. The whole point of security_barrier views is to not allow that exception conditions. UPDATE currently can't work directly with the subquery, so PostgreSQL will reject any attempt to update security_barrier view:

the
regress = > UPDATE t_odd
SET
secret = 'secret_haha' || n;
UPDATE 10 regress = > UPDATE t_odd_sb
SET
secret = 'secret_haha' || n;
ERROR: cannot UPDATE VIEW "t_odd_sb" DETAIL: SECURITY - barrier views ARE NOT automatically updatable. HINT: TO ENABLE updating the VIEW,
provide an INSTEAD OF UPDATE TRIGGER
OR an unconditional ON UPDATE DO INSTEAD RULE.

This is the very restriction, the abolition of which I am interested, as part of the development work of protection at the row level for project AXLE. Kohei KaiGai has done a tremendous amount of work with protection at the row level, and such things as security_barrier and LEAKPROOF largely arose from his work in the direction of add protection at the row level in PostgreSQL. The next challenge is how to deal with the update of the barrier safely and in such a way that it was serviced in the future.

Why is the subquery?


You may wonder why we use subqueries for this. I thought about it. Short version — we don't have to, but if we don't use subqueries, we need instead to create a new, sensitive sort of variation of and and OR operators and to teach the optimizer that it cannot pass through them. Since views are already expanded with subqueries, it is much easier to mark subqueries as fences that do not allow to extract or add data to them.
In PostgreSQL there is already a simplified orderly operation CASE. The problem with using CASE is that no operation can not cross the border CASE LEAKPROOF. As well as the optimizer and can not make decisions about the use of indexes? based on the expression within the CASE block. So, if we used CASE as I asked about it here, we would never bore could use an index to satisfy user-supplied conditions.

code


Support security_barrier was added to 0e4611c0234d89e288a53351f775c59522baed7c enhanced support for LEAKPROOF cd30728fb2ed7c367d545fc14ab850b5fa2a4850. Words of gratitude go to the commit notes. Thanks to all who participated.

PS. the Article is relatively old, but important as an introduction to the translation of the following article.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Automatically create Liquibase migrations for PostgreSQL

Vkontakte sync with address book for iPhone. How it was done

What part of the archived web