Hi, Sounds worth pursuing.
On 2022-07-13 10:09:50 -0700, Nathan Bossart wrote: > The attached patch has some key differences from the previous proposal. > For example, the new patch uses simplehash instead of open-coding a new > hash table. +1 > The attached patch waits until a lookup of [sub]xip before generating the > hash table, so we only need to allocate enough space for the current > elements in the [sub]xip array, and we avoid allocating extra memory for > workloads that do not need the hash tables. Hm. Are there any contexts where we'd not want the potential for failing due to OOM? I wonder if we additionally / alternatively could use a faster method of searching the array linearly, e.g. using SIMD. Another thing that might be worth looking into is to sort the xip/subxip arrays into a binary-search optimized layout. That'd make the binary search faster, wouldn't require additional memory (a boolean indicating whether sorted somewhere, I guess), and would easily persist across copies of the snapshot. > I'm slightly worried about increasing the number of memory > allocations in this code path, but the results above seemed encouraging on > that front. ISMT that the test wouldn't be likely to show those issues. > These hash tables are regarded as ephemeral; they only live in > process-local memory and are never rewritten, copied, or > serialized. What does rewriting refer to here? Not convinced that's the right idea in case of copying. I think we often end up copying snapshots frequently, and building & allocating the hashed xids separately every time seems not great. > + snapshot->xiph = NULL; > + snapshot->subxiph = NULL; Do we need separate hashes for these? ISTM that if we're overflowed then we don't need ->subxip[h], and if not, then the action for an xid being in ->xip and ->subxiph is the same? Greetings, Andres Freund