Re: [HACKERS] Cached plans and statement generalization

Konstantin Knizhnik Thu, 11 May 2017 12:42:27 -0700

On 05/11/2017 06:12 PM, Bruce Momjian wrote:

On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:

I am going to continue work on this patch I will be glad to receive any
feedback and suggestions for its improvement.
In most cases, applications are not accessing Postgres directly, but using
some connection pooling layer and so them are not able to use prepared
statements.
But at simple OLTP Postgres spent more time on building query plan than on
execution itself. And it is possible to speedup Postgres about two times at
such workload!
Another alternative is true shared plan cache.  May be it is even more
perspective approach, but definitely much more invasive and harder to
implement.

Can we back up and get an overview of what you are doing and how you are
doing it?  Our TODO list suggests this order for successful patches:


        Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part.  I think the design questions are:

*  What information is stored about cached plans?
*  How are the cached plans invalidated?
*  How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache.  However, caching and checking at the same level offers no
benefit, so they are going to be different.  For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements.  They are stored at the
end of planning and matched in the parser.  However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation.  However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options.  One interesting idea from Doug Doole
was to do it between the tokenizer and parser.  I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan.  The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do.  One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach.  Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved.  Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback.  I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.

Sorry, for luck of overview.
I have started with small prototype just to investigate if such optimization 
makes sense or not.
When I get more than two time advantage in performance on standard pgbench, I 
come to conclusion that this
optimization can be really very useful and now try to find the best way of its 
implementation.

I have started with simplest approach when string literals are replaced with 
parameters. It is done before parsing.
And can be done very fast - just need to locate data in quotes.
But this approach is not safe and universal: you will not be able to speedup 
most of the existed queries without rewriting them.

This is why I have provided second implementation which replace literals with 
parameters after raw parsing.
Certainly it is slower than first approach. But still provide significant 
advantage in performance: more than two times at pgbench.
Then I tried to run regression tests and find several situations where type 
analysis is not correctly performed in case of replacing literals with 
parameters.

So my third attempt is to replace constant nodes with parameters in already 
analyzed tree.

Now answering your questions:

*  What information is stored about cached plans?

Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource.

*  How are the cached plans invalidated?

In the same way as plans for explicitly prepared statements.

*  How is a query matched against a cached plan?

Hash function is calculated for raw parse tree and equal() function is used to 
compare raw plans with literal nodes replaced with parameters.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

Reply via email to