Re: Cached/global query plans, autopreparation

konstantin knizhnik Fri, 02 Mar 2018 22:23:25 -0800

On Mar 2, 2018, at 11:29 PM, Bruce Momjian wrote:

> On Thu, Feb 15, 2018 at 03:00:17PM +0100, Shay Rojansky wrote:
>> Just wanted to say that I've seen more than 10% improvement in some 
>> real-world
>> application when preparation was done properly. Also, I'm assuming that
>> implementing this wouldn't involve "rewriting substantial part of Postgres
>> core", and that even 10% is quite a big gain, especially if it's a 
>> transparent/
>> free one as far as the user is concerned (no application changes).
> 
> I would like to step back on this issue.  Ideally, every query would get
> re-optimized because we can only be sure the plan is optimal when we use
> supplied constants to generate the plan.  But, of course, parsing and
> planning take time, so there ideally would be an way to avoid it.  The
> question is always how much time will be saved by avoiding
> parsing/planning, and what risk is there of suboptimal plans.
> 
> Postgres uses a conservative method for reusing plans with previous
> constants, as described in the PREPARE manual page:
> 
>       https://www.postgresql.org/docs/10/static/sql-prepare.html
>       Prepared statements can use generic plans rather than re-planning with
>       each set of supplied EXECUTE values. This occurs immediately for 
> prepared
>       statements with no parameters; otherwise it occurs only after five or 
> more
>       executions produce plans whose estimated cost average (including 
> planning
>       overhead) is more expensive than the generic plan cost estimate. Once
>       a generic plan is chosen, it is used for the remaining lifetime of the
>       prepared statement. Using EXECUTE values which are rare in columns with
>       many duplicates can generate custom plans that are so much cheaper than
>       the generic plan, even after adding planning overhead, that the generic
>       plan might never be used.
> 
> While I have heard people complain about how other databases cache
> prepare plans, I have heard few complaints about the Postgres approach,
> and I haven't even heard of people asking to control the documented "five
> or more" behavior.


I am sorry, but I think that lack of complaints in this case just mean that 
people are not using prepared statements in Postgres.
Almost any productional system baed on Postgres has to use some kind of 
connection pooling (pgbouncer...) and it prevents usage of prepared statements 
in most cases.
I personally was involved in development in several such system where it was 
prohibited to use prepared statements because all connections to the database 
are expected to be done through pgbouncer.

I completely agree with you that general plan can be less efficient than 
specialized plan. It can be caused by data skew and non-uniform data 
distribution.
But what is the percent of such queries? I do not have much experience with 
investigating behavior of Postgres in real production systems.
So I can only guess. In the simplest queries like "select * from T where key=?" 
it is very inlikely that specialized plan will be more efficient, even in case 
of non-uniform distribution of "key" values. And for complex plans involving 
multiples joins Postgres optimizer in any case will not be able to calculate 
more or less precise estimation after few joins. So in this case    
specialized plan with literals instead of placeholders once again will not be 
more efficient.

This rule with five attempts before switching to generic plan works well for 
partitioning where generic plan cause query execution at all partition and so 
is several times less efficient than specialized plan which is able to restrict 
query execution just to one partition. But I think that in this case right 
solution is  runtime partitioning pruning  which will allow to use prepared 
statements for tables with inheritance.


> 
> I also know that other database products have more sophisticated prepare
> usage, but they might have higher parse/plan overhead, or they might be
> more flexible in handling specialized workloads, which Postgres might
> not want to handle, given the costs/complexity/overhead.
> 
> So, the directions for improvement are:
> 
> 1  Improve the existing "five or more" behavior
> 2  Automatically prepare queries that are not sent as prepared queries
> 3  Share plans among sessions
> 
> While #1 would be nice, #2 increases the number of applications that can
> silently benefit from prepared queries, and #3 improves the number of
> cases that query plans can be reused.  The issue with #3 is that the
> constants used are no longer local to the session (which is the same
> issue with connection poolers reusing prepared plans).  When different
> sessions with potentially more varied constants reuse plans, the
> probability of suboptimal plans increases.
> 
> I think the fact that pgbench shows a 2x improvement for prepared
> statements, and real world reports are a 10% improvement means we need
> to have a better understanding of exactly what workloads can benefit
> from this, and a comprehensive analysis of all three areas of
> improvement.

I completely agree with you.
If we have autoprepared statements or shared plans then more sophisticated 
checks for efficiency of generic query is definitely needed.
Just want to add few words about estimation of complexity of all this 
approaches.

Autoprepare patch already exists and can improve performance on OLTP workload 
up to two times.
It will be nice to perform more experiments on real system to estimate real 
effect of autoprepare. But until this patch is not included in Postgres, it 
will be hard to perform such experiments.  It is very difficult to try effect 
of prepared statements at real system working through pgbouncer.
And with autoprepare it will be enough just to toggle one GUC variable to  
perform this experiment.

Implementation of shard cache is definitely the most universal and efficient 
solution. But its implementation will require a  lot of efforts and even if we 
decided to more in this direction,
I do not think that it will be ready till 11 or even 12 release of Postgres.

It's a pity that we have to loose up to two times of possible Postgres 
performance just because of lack of any suitable mechanism for preparing 
queries.


> 
> -- 
>  Bruce Momjian  <[email protected]>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
> 
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +

Re: Cached/global query plans, autopreparation

Reply via email to