Hi, I'm sorry, but I'm afraid this won't make it into PG18 :-(
AFAICS the updated patch is correct / not buggy for non-shared hash tables, but considering I missed a pretty serious flaw before pushing the original patch, and that the tests didn't catch that either ... The risk/benefit is not really worth it for PG18. I think this needs a bit more work/testing. The dynahash is a bit too critical piece, I don't want to be reverting a patch a second time. Sorry, I realize it's not a great outcome ... On 4/4/25 17:45, Rahila Syed wrote: > Hi, > > Analysis of the Bug <https://www.postgresql.org/message- > id/3d36f787-8ba5-4aa2-abb1-7ade129a7b0e%40vondra.me> in 0002 reported by > David Rowley : > The 0001* patch allocates memory for the hash header, directory, segments, > and elements collectively for both shared and non-shared hash tables. While > this approach works well for shared hash tables, it presents an issue > for non- > shared hash tables. Specifically, during the expand_table() process, > non-shared hash tables may reallocate a new directory and free the old one. > Since the directory's memory is no longer allocated individually, it > cannot be freed > separately. This results in the following error: > > ERROR: pfree called with invalid pointer 0x60a15edc44e0 (header > 0x0000002000000008) > > These allocations are done together to improve reporting of shared > memory allocations > for shared hash tables. Similar change is done for non-shared hash > tables only to maintain > consistency since hash_create code is shared between both types of hash > tables. > Thanks, that matches my conclusions after looking into this. > One solution could be separating allocation of directory from rest of > the allocations for > the non-shared hash tables, but this approach would undermine the > purpose of > doing the change for a non-shared hash table. > I don't follow. Why would that undermine the change for non-shared hash tables? Why should this affect non-shared hash tables at all? The goal was to improve accounting for shared hash tables. > A better/safer solution would be to do this change only for shared hash > tables and > exclude the non-shared hash tables. > > I believe it's acceptable to allocate everything in a single block > provided we are not trying > to individually free any of these, which we do only for the directory > pointer in dir_realloc. > Additional segment allocation goes through seg_alloc and element > allocations through > element_alloc, which do not free existing chunks but instead allocate > new ones with > pointers in existing directories and segments. > Thus, as long as we don't reallocate the directory, which we avoid in > the case of shared > hash tables, it should be safe to proceed with this change. > Right, I believe that's correct. But I admit I find the current dynahash code a bit confusing, especially in how it mixes handling of non-shared and shared hash tables. (Not the fault of this patch, ofc.) > Please find attached the patch which removes the changes for non-shared > hash tables > and keeps them for shared hash tables. > > I tested this by running make-check, make-check world and the reproducer > script shared > by David. I also ran pgbench to test creation and expansion of some of the > shared hash tables. > Thanks. I'm afraid the existing regression tests can't tell us much, considering it didn't fail once with the reverted patch :-( I did check the coverage in: https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html and sure enough, dir_realloc() is not executed once. And there's a couple more pieces that have the same issue (e.g. hash_freeze or parts of get_hash_entry). I realize that's not the fault of this patch, but would you be willing to improve that? It'd certainly make me less concerned if we improved this first. regards -- Tomas Vondra