[PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread Karl Czajkowski
I originally was thinking I had a performance problem related to
row-level security but in reducing it to a simple test case I realized
it applies more generally.  The query planner does not seem to
recognize that it can eliminate redundant calls to a STABLE function.
It will optimize the same call if the function is marked IMMUTABLE.

In my case, the function call does not take any arguments and is thus
trivially independent of row data, and appears in a WHERE clause being
compared to constants. Why wouldn't the optimizer treat this case the
same as IMMUTABLE?


Toy problem
--

This abstracts a scenario where a web application stores client
identity attributes (i.e. roles or groups) into a config variable
which can be accessed from a stored procedure to allow policy checks
against that client context.

The example query emulates a policy that is a disjunction of two
rules:

  1. clients who intersect a constant ACL
 '{C,E,Q}'::text[] && current_attributes()

  2. rows whose "cls" intersect a constant mask
 cls = ANY('{0,1,5}'::text[])

This is a common idiom for us, where some rows are restricted from the
general user base but certain higher privilege clients can see all
rows.

So our test query is simply:

  SELECT *
  FROM mydata
  WHERE '{C,E,Q}'::text[] && current_attributes()
 OR cls = ANY('{0,1,5}'::text[])
 ;


Test setup
--

I set a very high cost on the function to attempt to encourage the
planner to optimize away the calls, but it doesn't seem to make any
difference.  Based on some other discussions I have seen, I also tried
declaring it as LEAKPROOF but saw no change in behavior there either.

  CREATE OR REPLACE FUNCTION current_attributes() RETURNS text[]
  STABLE COST 100
  AS $$
  BEGIN
RETURN current_setting('mytest.attributes')::text[];
  END;
  $$ LANGUAGE plpgsql;

  CREATE TABLE mydata (
id serial PRIMARY KEY,
val integer,
cls text
  );

  INSERT INTO mydata (val, cls)
  SELECT v, (v % 13)::text
  FROM generate_series(1, 100, 1) AS s (v);

  CREATE INDEX ON mydata(cls);
  ANALYZE mydata;


Resulting plans and performance
--

These results are with PostgreSQL 9.5 on a Fedora 25 workstation but I
see essentially the same behavior on 9.6 as well.

For an intersecting ACL scenario, I set client context as:

  SELECT set_config('mytest.attributes', '{A,B,C,D}', False);

and for non-intersecting, I set:

  SELECT set_config('mytest.attributes', '{}', False);

In an ideal world, the planner knows it can solve the ACL intersection
once, independent of any row data and then form a different plan
optimized around that answer, the same as if we'd just put a constant
true or false term in our WHERE clause.


A. STABLE function for intersecting ACL

   Seq Scan on mydata  (cost=0.00..2500021656.00 rows=238030 width=10) (actual 
time=0.053..1463.382 rows=100 loops=1)
 Filter: (('{C,E,Q}'::text[] && current_attributes()) OR (cls = ANY 
('{0,1,5}'::text[])))
   Planning time: 0.093 ms
   Execution time: 1500.395 ms

B. IMMUTABLE function for intersecting ACL

   Seq Scan on mydata  (cost=0.00..15406.00 rows=100 width=10) (actual 
time=0.009..78.474 rows=100 loops=1)
   Planning time: 0.247 ms
   Execution time: 117.610 ms

C. STABLE function for non-intersecting ACL

   Seq Scan on mydata  (cost=0.00..2500021656.00 rows=238030 width=10) (actual 
time=0.179..1190.484 rows=230770 loops=1)
 Filter: (('{C,E,Q}'::text[] && current_attributes()) OR (cls = ANY 
('{0,1,5}'::text[])))
 Rows Removed by Filter: 769230
   Planning time: 0.088 ms
   Execution time: 1199.729 ms

D. IMMUTABLE function for non-intersecting ACL

   Bitmap Heap Scan on mydata  (cost=4058.36..12631.44 rows=230333 width=10) 
(actual time=32.444..76.934 rows=230770 loops=1)
 Recheck Cond: (cls = ANY ('{0,1,5}'::text[]))
 Heap Blocks: exact=5406
 ->  Bitmap Index Scan on mydata_cls_idx  (cost=0.00..4000.78 rows=230333 
width=0) (actual time=31.012..31.012 rows=230770 loops=1)
   Index Cond: (cls = ANY ('{0,1,5}'::text[]))
   Planning time: 0.331 ms
   Execution time: 87.475 ms

You can see the roughly 10-15x performance difference above. In our
real application with more sprawling data, sorting, and lots of
available column indices, the effects are even more pronounced.

Is there any hope for the planner to start optimizing these
row-independent, stable function calls the same way it does immutable
ones?

We tend to start a transaction, set the config parameter with the web
client identity attributes, and then run the other
performance-sensitive statements to completion (or error) in the same
transaction.  We don't further modify the parameter during the
lifetime of one web request handler.  We cycle through different
parameter settings only when we reuse a connection for multiple web
requests which may all be from different clients.

Is i

Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread Tom Lane
Karl Czajkowski  writes:
> The query planner does not seem to
> recognize that it can eliminate redundant calls to a STABLE function.

No, it doesn't.

> In my case, the function call does not take any arguments and is thus
> trivially independent of row data, and appears in a WHERE clause being
> compared to constants. Why wouldn't the optimizer treat this case the
> same as IMMUTABLE?

"The same as IMMUTABLE" would be to reduce the function to a constant at
plan time, which would be the wrong thing.  It would be valid to execute
it only once at query start, but there's no built-in mechanism for that.

But you could force it by putting it in a sub-SELECT, that is if you
don't like the performance of

  SELECT ... slow_stable_function() ...

try this:

  SELECT ... (SELECT slow_stable_function()) ...

That works because it's an uncorrelated sub-query, which gets evaluated
just once per run.  But the overhead associated with that mechanism is
high enough that forcing it automatically for every stable function would
be a loser.  I'd recommend doing it only where it *really* matters.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread David G. Johnston
On Wed, Jan 18, 2017 at 3:54 PM, Tom Lane  wrote:

> Karl Czajkowski  writes:
> > The query planner does not seem to
> > recognize that it can eliminate redundant calls to a STABLE function.
>
> No, it doesn't.
>
> > In my case, the function call does not take any arguments and is thus
> > trivially independent of row data, and appears in a WHERE clause being
> > compared to constants. Why wouldn't the optimizer treat this case the
> > same as IMMUTABLE?
>
> "The same as IMMUTABLE" would be to reduce the function to a constant at
> plan time, which would be the wrong thing.  It would be valid to execute
> it only once at query start, but there's no built-in mechanism for that.
>

​I'm feeling a bit dense here but even after having read a number of these
kinds of interchanges I still can't get it to stick.  I think part of the
problem is this sentence from the docs:

https://www.postgresql.org/docs/current/static/xfunc-volatility.html

(Stable): "​This category allows the optimizer to optimize multiple calls
of the function to a single call"

I read that sentence (and the surrounding paragraph) and wonder why then
doesn't it do so in this case.

If PostgreSQL cannot execute it only once at query start then all this talk
about optimization seems misleading.  At worse there should be an sentence
explaining when the optimizations noted in that paragraph cannot occur -
and probably examples of both as well since its not clear when it can occur.

Some TLC to the docs here would be welcomed.

David J.


Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread Karl Czajkowski
On Jan 18, Tom Lane modulated:
> "The same as IMMUTABLE" would be to reduce the function to a constant at
> plan time, which would be the wrong thing.  It would be valid to execute
> it only once at query start, but there's no built-in mechanism for that.
> 

That's what I was afraid of... right, we'd like for it to partially
evaluate these and then finish the plan.

Is there a correctness hazard with pretending our function is
IMMUTABLE, even though we will change the underlying config parameter
in the same connection?  It would be very bad if we changed our
parameter to reflect a different web client identity context, but then
somehow got cached plans based on the previous setting...

> But you could force it by putting it in a sub-SELECT, that is if you
> don't like the performance of
> 
>   SELECT ... slow_stable_function() ...
> 
> try this:
> 
>   SELECT ... (SELECT slow_stable_function()) ...
> 

Ha, you might recall a while back I was having problems with
round-tripping our RLS policies because I had tried such sub-selects
which return arrays and the internal format lost the casts needed to
get the correct parse when reloading a dump... :-)


karl



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread Tom Lane
"David G. Johnston"  writes:
> ​I'm feeling a bit dense here but even after having read a number of these
> kinds of interchanges I still can't get it to stick.  I think part of the
> problem is this sentence from the docs:
> https://www.postgresql.org/docs/current/static/xfunc-volatility.html

> (Stable): "​This category allows the optimizer to optimize multiple calls
> of the function to a single call"

> I read that sentence (and the surrounding paragraph) and wonder why then
> doesn't it do so in this case.

It says "allows", it doesn't say "requires".

The reason we have this category is that without it, it would be formally
invalid to optimize an expression involving a non-immutable function into
an index comparison value, because in that context the function is indeed
only evaluated once (before the comparison value is fed into the index
machinery).  But there isn't a mechanism for that behavior outside of
index scans.

> If PostgreSQL cannot execute it only once at query start then all this talk
> about optimization seems misleading.  At worse there should be an sentence
> explaining when the optimizations noted in that paragraph cannot occur -
> and probably examples of both as well since its not clear when it can occur.

If you want an exact definition of when things will happen or not happen,
start reading the source code.  I'm loath to document small optimizer
details since they change all the time.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread Tom Lane
Karl Czajkowski  writes:
> Is there a correctness hazard with pretending our function is
> IMMUTABLE, even though we will change the underlying config parameter
> in the same connection?

You could probably get away with that if you never ever use prepared
queries (beware that almost anything in plpgsql is a prepared query).
It's a trick that's likely to bite you eventually though.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread David G. Johnston
On Wed, Jan 18, 2017 at 4:23 PM, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > ​I'm feeling a bit dense here but even after having read a number of
> these
> > kinds of interchanges I still can't get it to stick.  I think part of the
> > problem is this sentence from the docs:
> > https://www.postgresql.org/docs/current/static/xfunc-volatility.html
>
> > (Stable): "​This category allows the optimizer to optimize multiple calls
> > of the function to a single call"
>


> If PostgreSQL cannot execute it only once at query start then all this
> talk
> > about optimization seems misleading.  At worse there should be an
> sentence
> > explaining when the optimizations noted in that paragraph cannot occur -
> > and probably examples of both as well since its not clear when it can
> occur.
>
> If you want an exact definition of when things will happen or not happen,
> start reading the source code.  I'm loath to document small optimizer
> details since they change all the time.
>

​That would not be a productive exercise for me, or most people who just
want
some idea of what to expect in terms of behavior when they write and use a
Stable function (Immutable and Volatile seem fairly easy to reason about).

Is there anything fatally wrong with the following comprehension?

"""
A STABLE function cannot modify the database and is guaranteed to
return the same results given the same arguments for all rows
within a single statement.

This category allows the optimizer to take an expression of the form
(indexed_column = stable_function(...)) and evaluate stable_function(...)
once at the beginning of the query and use the result to scan
the index. (Since an index scan will evaluate the comparison
value only once, not once at each row, it is not valid to use a VOLATILE
 function in an index scan condition).  ?Note that should an index scan not
be
chosen for the plan the function will be invoked once-per-row?

Expressions of the forms (constant = stable_function()),
and (SELECT stable_function() FROM generate_series(1,5)) are not presently
optimized to a single per-query evaluation.  To obtain the equivalent you
can invoke the function in a sub-query or CTE and reference the result
wherever it is needed.
"""

It probably isn't perfect but if the average user isn't going to benefit
from
anything besides "index_column = function()" with an index plan then the
false hope that is being held due to the use of "allows + in particular"
should probably be dispelled.

Thanks!

David J.


Re: [PERFORM] optimizing immutable vs. stable function calls?

2017-01-18 Thread Karl Czajkowski
On Jan 18, Tom Lane modulated:
> Karl Czajkowski  writes:
> > Is there a correctness hazard with pretending our function is
> > IMMUTABLE, even though we will change the underlying config parameter
> > in the same connection?
> 
> You could probably get away with that if you never ever use prepared
> queries (beware that almost anything in plpgsql is a prepared query).
> It's a trick that's likely to bite you eventually though.
> 

That sounds unnerving. I think I need to play it safe. :-/

Does the plan cache disappear with each connection/backend process?
Or is there also a risk of plans being shared between backends?

Would it be invasive or a small hack to have something like
"transaction-immutable" which can be precomputed during planning, like
immutable, but then must discard those plans at the end of the
transaction...?


karl



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance