[PERFORM] optimizing immutable vs. stable function calls?
I originally was thinking I had a performance problem related to row-level security but in reducing it to a simple test case I realized it applies more generally. The query planner does not seem to recognize that it can eliminate redundant calls to a STABLE function. It will optimize the same call if the function is marked IMMUTABLE. In my case, the function call does not take any arguments and is thus trivially independent of row data, and appears in a WHERE clause being compared to constants. Why wouldn't the optimizer treat this case the same as IMMUTABLE? Toy problem -- This abstracts a scenario where a web application stores client identity attributes (i.e. roles or groups) into a config variable which can be accessed from a stored procedure to allow policy checks against that client context. The example query emulates a policy that is a disjunction of two rules: 1. clients who intersect a constant ACL '{C,E,Q}'::text[] && current_attributes() 2. rows whose "cls" intersect a constant mask cls = ANY('{0,1,5}'::text[]) This is a common idiom for us, where some rows are restricted from the general user base but certain higher privilege clients can see all rows. So our test query is simply: SELECT * FROM mydata WHERE '{C,E,Q}'::text[] && current_attributes() OR cls = ANY('{0,1,5}'::text[]) ; Test setup -- I set a very high cost on the function to attempt to encourage the planner to optimize away the calls, but it doesn't seem to make any difference. Based on some other discussions I have seen, I also tried declaring it as LEAKPROOF but saw no change in behavior there either. CREATE OR REPLACE FUNCTION current_attributes() RETURNS text[] STABLE COST 100 AS $$ BEGIN RETURN current_setting('mytest.attributes')::text[]; END; $$ LANGUAGE plpgsql; CREATE TABLE mydata ( id serial PRIMARY KEY, val integer, cls text ); INSERT INTO mydata (val, cls) SELECT v, (v % 13)::text FROM generate_series(1, 100, 1) AS s (v); CREATE INDEX ON mydata(cls); ANALYZE mydata; Resulting plans and performance -- These results are with PostgreSQL 9.5 on a Fedora 25 workstation but I see essentially the same behavior on 9.6 as well. For an intersecting ACL scenario, I set client context as: SELECT set_config('mytest.attributes', '{A,B,C,D}', False); and for non-intersecting, I set: SELECT set_config('mytest.attributes', '{}', False); In an ideal world, the planner knows it can solve the ACL intersection once, independent of any row data and then form a different plan optimized around that answer, the same as if we'd just put a constant true or false term in our WHERE clause. A. STABLE function for intersecting ACL Seq Scan on mydata (cost=0.00..2500021656.00 rows=238030 width=10) (actual time=0.053..1463.382 rows=100 loops=1) Filter: (('{C,E,Q}'::text[] && current_attributes()) OR (cls = ANY ('{0,1,5}'::text[]))) Planning time: 0.093 ms Execution time: 1500.395 ms B. IMMUTABLE function for intersecting ACL Seq Scan on mydata (cost=0.00..15406.00 rows=100 width=10) (actual time=0.009..78.474 rows=100 loops=1) Planning time: 0.247 ms Execution time: 117.610 ms C. STABLE function for non-intersecting ACL Seq Scan on mydata (cost=0.00..2500021656.00 rows=238030 width=10) (actual time=0.179..1190.484 rows=230770 loops=1) Filter: (('{C,E,Q}'::text[] && current_attributes()) OR (cls = ANY ('{0,1,5}'::text[]))) Rows Removed by Filter: 769230 Planning time: 0.088 ms Execution time: 1199.729 ms D. IMMUTABLE function for non-intersecting ACL Bitmap Heap Scan on mydata (cost=4058.36..12631.44 rows=230333 width=10) (actual time=32.444..76.934 rows=230770 loops=1) Recheck Cond: (cls = ANY ('{0,1,5}'::text[])) Heap Blocks: exact=5406 -> Bitmap Index Scan on mydata_cls_idx (cost=0.00..4000.78 rows=230333 width=0) (actual time=31.012..31.012 rows=230770 loops=1) Index Cond: (cls = ANY ('{0,1,5}'::text[])) Planning time: 0.331 ms Execution time: 87.475 ms You can see the roughly 10-15x performance difference above. In our real application with more sprawling data, sorting, and lots of available column indices, the effects are even more pronounced. Is there any hope for the planner to start optimizing these row-independent, stable function calls the same way it does immutable ones? We tend to start a transaction, set the config parameter with the web client identity attributes, and then run the other performance-sensitive statements to completion (or error) in the same transaction. We don't further modify the parameter during the lifetime of one web request handler. We cycle through different parameter settings only when we reuse a connection for multiple web requests which may all be from different clients. Is i
Re: [PERFORM] optimizing immutable vs. stable function calls?
Karl Czajkowski writes: > The query planner does not seem to > recognize that it can eliminate redundant calls to a STABLE function. No, it doesn't. > In my case, the function call does not take any arguments and is thus > trivially independent of row data, and appears in a WHERE clause being > compared to constants. Why wouldn't the optimizer treat this case the > same as IMMUTABLE? "The same as IMMUTABLE" would be to reduce the function to a constant at plan time, which would be the wrong thing. It would be valid to execute it only once at query start, but there's no built-in mechanism for that. But you could force it by putting it in a sub-SELECT, that is if you don't like the performance of SELECT ... slow_stable_function() ... try this: SELECT ... (SELECT slow_stable_function()) ... That works because it's an uncorrelated sub-query, which gets evaluated just once per run. But the overhead associated with that mechanism is high enough that forcing it automatically for every stable function would be a loser. I'd recommend doing it only where it *really* matters. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] optimizing immutable vs. stable function calls?
On Wed, Jan 18, 2017 at 3:54 PM, Tom Lane wrote: > Karl Czajkowski writes: > > The query planner does not seem to > > recognize that it can eliminate redundant calls to a STABLE function. > > No, it doesn't. > > > In my case, the function call does not take any arguments and is thus > > trivially independent of row data, and appears in a WHERE clause being > > compared to constants. Why wouldn't the optimizer treat this case the > > same as IMMUTABLE? > > "The same as IMMUTABLE" would be to reduce the function to a constant at > plan time, which would be the wrong thing. It would be valid to execute > it only once at query start, but there's no built-in mechanism for that. > I'm feeling a bit dense here but even after having read a number of these kinds of interchanges I still can't get it to stick. I think part of the problem is this sentence from the docs: https://www.postgresql.org/docs/current/static/xfunc-volatility.html (Stable): "This category allows the optimizer to optimize multiple calls of the function to a single call" I read that sentence (and the surrounding paragraph) and wonder why then doesn't it do so in this case. If PostgreSQL cannot execute it only once at query start then all this talk about optimization seems misleading. At worse there should be an sentence explaining when the optimizations noted in that paragraph cannot occur - and probably examples of both as well since its not clear when it can occur. Some TLC to the docs here would be welcomed. David J.
Re: [PERFORM] optimizing immutable vs. stable function calls?
On Jan 18, Tom Lane modulated: > "The same as IMMUTABLE" would be to reduce the function to a constant at > plan time, which would be the wrong thing. It would be valid to execute > it only once at query start, but there's no built-in mechanism for that. > That's what I was afraid of... right, we'd like for it to partially evaluate these and then finish the plan. Is there a correctness hazard with pretending our function is IMMUTABLE, even though we will change the underlying config parameter in the same connection? It would be very bad if we changed our parameter to reflect a different web client identity context, but then somehow got cached plans based on the previous setting... > But you could force it by putting it in a sub-SELECT, that is if you > don't like the performance of > > SELECT ... slow_stable_function() ... > > try this: > > SELECT ... (SELECT slow_stable_function()) ... > Ha, you might recall a while back I was having problems with round-tripping our RLS policies because I had tried such sub-selects which return arrays and the internal format lost the casts needed to get the correct parse when reloading a dump... :-) karl -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] optimizing immutable vs. stable function calls?
"David G. Johnston" writes: > I'm feeling a bit dense here but even after having read a number of these > kinds of interchanges I still can't get it to stick. I think part of the > problem is this sentence from the docs: > https://www.postgresql.org/docs/current/static/xfunc-volatility.html > (Stable): "This category allows the optimizer to optimize multiple calls > of the function to a single call" > I read that sentence (and the surrounding paragraph) and wonder why then > doesn't it do so in this case. It says "allows", it doesn't say "requires". The reason we have this category is that without it, it would be formally invalid to optimize an expression involving a non-immutable function into an index comparison value, because in that context the function is indeed only evaluated once (before the comparison value is fed into the index machinery). But there isn't a mechanism for that behavior outside of index scans. > If PostgreSQL cannot execute it only once at query start then all this talk > about optimization seems misleading. At worse there should be an sentence > explaining when the optimizations noted in that paragraph cannot occur - > and probably examples of both as well since its not clear when it can occur. If you want an exact definition of when things will happen or not happen, start reading the source code. I'm loath to document small optimizer details since they change all the time. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] optimizing immutable vs. stable function calls?
Karl Czajkowski writes: > Is there a correctness hazard with pretending our function is > IMMUTABLE, even though we will change the underlying config parameter > in the same connection? You could probably get away with that if you never ever use prepared queries (beware that almost anything in plpgsql is a prepared query). It's a trick that's likely to bite you eventually though. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] optimizing immutable vs. stable function calls?
On Wed, Jan 18, 2017 at 4:23 PM, Tom Lane wrote: > "David G. Johnston" writes: > > I'm feeling a bit dense here but even after having read a number of > these > > kinds of interchanges I still can't get it to stick. I think part of the > > problem is this sentence from the docs: > > https://www.postgresql.org/docs/current/static/xfunc-volatility.html > > > (Stable): "This category allows the optimizer to optimize multiple calls > > of the function to a single call" > > If PostgreSQL cannot execute it only once at query start then all this > talk > > about optimization seems misleading. At worse there should be an > sentence > > explaining when the optimizations noted in that paragraph cannot occur - > > and probably examples of both as well since its not clear when it can > occur. > > If you want an exact definition of when things will happen or not happen, > start reading the source code. I'm loath to document small optimizer > details since they change all the time. > That would not be a productive exercise for me, or most people who just want some idea of what to expect in terms of behavior when they write and use a Stable function (Immutable and Volatile seem fairly easy to reason about). Is there anything fatally wrong with the following comprehension? """ A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to take an expression of the form (indexed_column = stable_function(...)) and evaluate stable_function(...) once at the beginning of the query and use the result to scan the index. (Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition). ?Note that should an index scan not be chosen for the plan the function will be invoked once-per-row? Expressions of the forms (constant = stable_function()), and (SELECT stable_function() FROM generate_series(1,5)) are not presently optimized to a single per-query evaluation. To obtain the equivalent you can invoke the function in a sub-query or CTE and reference the result wherever it is needed. """ It probably isn't perfect but if the average user isn't going to benefit from anything besides "index_column = function()" with an index plan then the false hope that is being held due to the use of "allows + in particular" should probably be dispelled. Thanks! David J.
Re: [PERFORM] optimizing immutable vs. stable function calls?
On Jan 18, Tom Lane modulated: > Karl Czajkowski writes: > > Is there a correctness hazard with pretending our function is > > IMMUTABLE, even though we will change the underlying config parameter > > in the same connection? > > You could probably get away with that if you never ever use prepared > queries (beware that almost anything in plpgsql is a prepared query). > It's a trick that's likely to bite you eventually though. > That sounds unnerving. I think I need to play it safe. :-/ Does the plan cache disappear with each connection/backend process? Or is there also a risk of plans being shared between backends? Would it be invasive or a small hack to have something like "transaction-immutable" which can be precomputed during planning, like immutable, but then must discard those plans at the end of the transaction...? karl -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance