Re: Global temporary tables

Konstantin Knizhnik Sat, 10 Aug 2019 23:53:47 -0700


On 10.08.2019 5:12, Craig Ringer wrote:

On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik<k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>> wrote:
    Ok, here it is: global_private_temp-1.patch


Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.
Doubly so if it brings us closer to being able to use temp tables onphysical read replicas, though I know there are plenty of otherbarriers there (not least of which being temp tables using persistenttxns not vtxids)
Does it have a CF entry?


https://commitfest.postgresql.org/24/2233/

    Also I have attached updated version of the global temp tables
    with shared buffers - global_shared_temp-1.patch
Nice to see that split out. In addition to giving the first patch morehope of being committed this time around, it'll help with readabilityand testability too.
To be clear, I have long wanted to see PostgreSQL have the "session"state abstraction you have implemented. I think it's really importantfor high client count OLTP workloads, working with the endlesscollection of ORMs out there, etc. So I'm all in favour of it inprinciple so long as it can be made to work reliably with limitedperformance impact on existing workloads and without making life lotsharder when adding new core functionality, for extension authors etc.The same goes for built-in pooling. I think PostgreSQL has needed somesort of separation of "connection", "backend", "session" and"executor" for a long time and I'm glad to see you working on it.
With that said: How do you intend to address the likelihood that thiswill cause performance regressions for existing workloads that usetemp tables *without* relying on your session state and connectionpooler? Consider workloads that use temp tables for mid-long txnswhere txn pooling is unimportant, where they also do plenty of readand write activity on persistent tables. Classic OLAP/DW stuff. e.g.:
* four clients, four backends, four connections, session-levelconnections that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to largepersistent table (think fact tables, etc)* transactions store a filtered, joined dataset with some pre-computedwindow results or something in temp tables* benchmark workload makes big-ish temp tables to store intermediatedata for its medium-length transactions* transactions also write to some persistent relations, say to recordtheir summarised results
How does it perform with and without your patch? I'm concerned that:
* the extra buffer locking and various IPC may degrade performance oftemp tables* the temp table data in shared_buffers may put pressure onshared_buffers space, cached pages for persistent tables all sessionsare sharing;* the temp table data in shared_buffers may put pressure onshared_buffers space for dirty buffers, forcing writes of persistenttables out earlier therefore reducing write-combining opportunities;

I agree that access to local buffers is cheaper than to shared buffersbecause there is no lock overhead.And the fact that access to local tables can not affect cached data ofpersistent tables is also important.But most of Postgres tables are still normal (persistent) tables accessthrough shared buffers.And huge amount of efforts were made to make this access as efficient aspossible (use clock algorithm which doesn't require global lock,atomic operations,...). Also using the same replacement discipline forall tables at some workloads may be also preferable.So it is not so obvious to me that in the described scenario localbuffer cache for temporary table really will provide significant advantages.

It will be interesting to perform some benchmarking - I am going to do it.

What I have observed right now is that in type scenario: dumping resultsof huge query to temporary table with subsequent traverse of this tableold (local) temporary tables provide better performance (may be becauseof small size of local buffer cache and different eviction policy).But subsequent accesses to global shared table are faster (because itcompletely fits in large shared buffer cache).

There is one more problem with global temporary tables for which I donot know good solution now: collecting statistic.As far as each backend has its own data, generally them may needdifferent query plans.Right now if you perform "analyze table" in one backend, then it willaffect plans in all backends.It can be considered not as bug, but as feature if we assume thatdistribution if data in all backens is similar.

But if this assumption is not true, then it can be a problem.

Re: Global temporary tables

Reply via email to