On Thu, 30 Jan 2025 at 08:48, Matthias van de Meent <boekewurm+postg...@gmail.com> wrote: > > Together that results in the following prototype patchset.
Here's an alternative patch, which replaces dynahash in the buffer lookup table with an open-coded replacement that has fewer indirections during lookups, and allows for a much increased locality. Overall, it has less memory usage than our current dynahash in all cases; but compared to my previous patch it improves memory only in some cases, in others it uses more memory. The main design is a separate chaining hash table (similar to and derived from code of Dynahash) to provide efficient access to free elements. The innovation over Dynahash is that in this new hash table the stored elements are slotted between the bucket's element pointers, thus allowing access to hash elements without further cache misses if the element that stores the bucket's data is stored on the same cache line (very possible). Further, the implementation uses int-based offset pointers, reducing pointer size overhead on 64-bit system and limiting the hash table to 2^31 elements (for a buffer mapping table that seems to be fine, given our current buffer count limit of 2^30-1). Finally, it shows up with accurate memory usage in pg_shmem_allocations due to the usage of a single allocation, improving user debugability of resources vs dynahash. 0001 implements this hash table, with some minimal optimizations. 0002 adds the optimization where the hash table prefers to pick the local hash element if it's not already in use. Future optimizations could implement memory prefetching (so that buffer tag compares won't have cache misses for memory stalls), cacheline-local entry selection, and other optimizations, but that'll need more effort which I don't want to put into it before getting a green light on "this might actually be a good step in the right direction". As to why simplehash was not chosen: Simplehash doesn't (can't?) support partitioning due to its open addressing model. By taking the locality of open addressing and combining it with the partitioning capabilities of separate chaining we get the best of both worlds (or at least closer than dynahash got). Kind regards, Matthias van de Meent Neon (https://neon.tech) PS. This isn't a perfect solution; it still causes approximately random memory accesses when we look up sequential pages of a relation. However, the cost of these memory accesses can now be significantly reduced without breaking APIs or significantly impacting memory overheads. Replacing the hashtable with another seems fairly straightforward, while replacing the hashtable with e.g. radixtree.h would be a very significant undertaking of shared memory allocations that I'm not comfortable with writing myself.
v1-0001-BufTable-Use-optimized-hash-table-for-buffer-look.patch
Description: Binary data
v1-0002-BufTable-Optimize-allocation-of-BufTable-entries.patch
Description: Binary data