Hi, On 2022-08-01 17:42:49 +0100, Simon Riggs wrote: > The reason for the slowdown is clear: when we overflow we check every > xid against subtrans, producing a large stream of lookups. Some > previous hackers have tried to speed up subtrans - this patch takes a > different approach: remove as many subtrans lookups as possible. (So > is not competing with those other solutions). > > Attached patch improves on the situation, as also shown in the attached > diagram.
I think we should consider redesigning subtrans more substantially - even with the changes you propose here, there's still plenty ways to hit really bad performance. And there's only so much we can do about that without more fundamental design changes. One way to fix a lot of the issues around pg_subtrans would be remove the pg_subtrans SLRU and replace it with a purely in-memory hashtable. IMO there's really no good reason to use an SLRU for it (anymore). In contrast to e.g. clog or multixact we don't need to access a lot of old entries, we don't need persistency etc. Nor is it a good use of memory and IO to have loads of pg_subtrans pages that don't point anywhere, because the xid is just a "normal" xid. While we can't put a useful hard cap on the number of potential subtrans entries (we can only throw subxid->parent mappings away once no existing snapshot might need them), saying that there can't be more subxids "considered running" at a time than can fit in memory doesn't seem like a particularly problematic restriction. So, why don't we use a dshash table with some amount of statically allocated memory for the mapping? In common cases that will *reduce* memory usage (because we don't need to reserve space for [as many] subxids in snapshots / procarray anymore) and IO (no mostly-zeroes pg_subtrans). Greetings, Andres Freund