Re: Smoothing the subtrans performance catastrophe

Andres Freund Wed, 03 Aug 2022 12:18:45 -0700

Hi,

On 2022-08-01 17:42:49 +0100, Simon Riggs wrote:
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
> 
> Attached patch improves on the situation, as also shown in the attached 
> diagram.


I think we should consider redesigning subtrans more substantially - even with
the changes you propose here, there's still plenty ways to hit really bad
performance. And there's only so much we can do about that without more
fundamental design changes.

One way to fix a lot of the issues around pg_subtrans would be remove the
pg_subtrans SLRU and replace it with a purely in-memory hashtable. IMO there's
really no good reason to use an SLRU for it (anymore).

In contrast to e.g. clog or multixact we don't need to access a lot of old
entries, we don't need persistency etc. Nor is it a good use of memory and IO
to have loads of pg_subtrans pages that don't point anywhere, because the xid
is just a "normal" xid.

While we can't put a useful hard cap on the number of potential subtrans
entries (we can only throw subxid->parent mappings away once no existing
snapshot might need them), saying that there can't be more subxids "considered
running" at a time than can fit in memory doesn't seem like a particularly
problematic restriction.

So, why don't we use a dshash table with some amount of statically allocated
memory for the mapping? In common cases that will *reduce* memory usage
(because we don't need to reserve space for [as many] subxids in snapshots /
procarray anymore) and IO (no mostly-zeroes pg_subtrans).

Greetings,

Andres Freund

Re: Smoothing the subtrans performance catastrophe

Reply via email to