On 23.04.2018 23:14, Robert Haas wrote:
On Wed, Apr 18, 2018 at 9:41 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
Well, may be I missed something, but i do not know how to efficiently
support
1. Temporary tables
2. Prepared statements
3. Sessoin GUCs
with any external connection pooler (with pooling level other than
session).
Me neither. What makes it easier to do these things in an internal
connection pooler? What could the backend do differently, to make these
easier to implement in an external pooler?
I think you are Konstantin are possibly failing to see the big picture
here.  Temporary tables, prepared statements, and GUC settings are
examples of session state that users expect will be preserved for the
lifetime of a connection and not beyond; all session state, of
whatever kind, has the same set of problems.  A transparent connection
pooling experience means guaranteeing that no such state vanishes
before the user ends the current session, and also that no such state
established by some other session becomes visible in the current
session.  And we really need to account for *all* such state, not just
really big things like temporary tables and prepared statements and
GUCs but also much subtler things such as the state of the PRNG
established by srandom().

It is not quit true thst I have not realized this issues.
In addition to connection pooling, I have also implemented pthread version of Postgres and their static variables are replaced with thread-local variables which let each thread use its own set of variables.

Unfortunately in connection pooling this approach can not be used.
But I think that performing scheduling at transaction level will eliminate the problem with static variables in most cases. My expectation is that there are very few of them which has session-level lifetime. Unfortunately it is not so easy to locate all such places. Once such variables are located, them can be saved in session context and restored on reschedule.

More challenging thing is to handle system static variables which which can not be easily saved/restored. You example with srandom is exactly such case. Right now I do not know any efficient way to suspend/resume pseudo-random sequence. But frankly speaking, that such behaviour of random is completely not acceptable and built-in session pool unusable.



This is really very similar to the problem that parallel query has
when spinning up new worker backends.  As far as possible, we want the
worker backends to have the same state as the original backend.
However, there's no systematic way of being sure that every relevant
backend-private global, including perhaps globals added by loadable
modules, is in exactly the same state.  For parallel query, we solved
that problem by copying a bunch of things that we knew were
commonly-used (cf. parallel.c) and by requiring functions to be
labeled as parallel-restricted if they rely on anything other state.
The problem for connection pooling is much harder.  If you only ever
ran parallel-safe functions throughout the lifetime of a session, then
you would know that the session has no "hidden state" other than what
parallel.c already knows about (except for any functions that are
mislabeled, but we can say that's the user's fault for mislabeling
them).  But as soon as you run even one parallel-restricted or
parallel-unsafe function, there might be a global variable someplace
that holds arbitrary state which the core system won't know anything
about.  If you want to have some other process take over that session,
you need to copy that state to the new process; if you want to reuse
the current process for a new session, you need to clear that state.
Since you don't know it exists or where to find it, and since the code
to copy and/or clear it might not even exist, you can't.

In other words, transparent connection pooling is going to require
some new mechanism, which third-party code will have to know about,
for tracking every last bit of session state that might need to be
preserved or cleared.  That's going to be a big project.  Maybe some
of that can piggyback on existing infrastructure like
InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
state to deal with.  And no out-of-core pooler has a chance of
handling all that stuff correctly; an in-core pooler will be able to
do so only with a lot of work.

I think that situation with parallel executors are slightly different: in this case several backends perform execution of the same query.
So them really need to somehow share/synchronize state of static variables.
But in case of connection pooling only one transaction is executed by backend at each moment of time. And there should be no problems with static variables unless them cross transaction boundary. But I do not think that there are many such variables.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to