Hi,

I'm sorry, but I'm afraid this won't make it into PG18 :-(

AFAICS the updated patch is correct / not buggy for non-shared hash
tables, but considering I missed a pretty serious flaw before pushing
the original patch, and that the tests didn't catch that either ...

The risk/benefit is not really worth it for PG18. I think this needs a
bit more work/testing. The dynahash is a bit too critical piece, I don't
want to be reverting a patch a second time.

Sorry, I realize it's not a great outcome ...

On 4/4/25 17:45, Rahila Syed wrote:
> Hi,
> 
> Analysis of the Bug <https://www.postgresql.org/message-
> id/3d36f787-8ba5-4aa2-abb1-7ade129a7b0e%40vondra.me> in 0002 reported by
> David Rowley :
> The 0001* patch allocates memory for the hash header, directory, segments,
> and elements collectively for both shared and non-shared hash tables. While
>  this approach works well for shared hash tables, it presents an issue
> for non-
> shared hash tables. Specifically, during the expand_table() process,
> non-shared hash tables may reallocate a new directory and free the old one.
> Since the directory's memory is no longer allocated individually, it
> cannot be freed
> separately. This results in the following error:
> 
> ERROR: pfree called with invalid pointer 0x60a15edc44e0 (header
> 0x0000002000000008)
> 
> These allocations are done together to improve reporting of shared
> memory allocations
> for shared hash tables. Similar change is done for non-shared hash
> tables only to maintain
> consistency since hash_create code is shared between both types of hash
> tables.
> 

Thanks, that matches my conclusions after looking into this.

> One solution could be separating allocation of directory from rest of
> the allocations for 
> the non-shared hash tables, but this approach would undermine the
> purpose of
> doing the change for a non-shared hash table.
> 

I don't follow. Why would that undermine the change for non-shared hash
tables? Why should this affect non-shared hash tables at all? The goal
was to improve accounting for shared hash tables.

> A better/safer solution would be to do this change only for shared hash
> tables and
> exclude the non-shared hash tables.
> 
> I believe it's acceptable to allocate everything in a single block
> provided we are not trying
> to individually free any of these, which we do only for the directory
> pointer in dir_realloc.
> Additional segment allocation goes through seg_alloc and element
> allocations through
> element_alloc, which do not free existing chunks but instead allocate
> new ones with
> pointers in existing directories and segments.
> Thus, as long as we don't reallocate the directory, which we avoid in
> the case of shared
> hash tables, it should be safe to proceed with this change.
> 

Right, I believe that's correct. But I admit I find the current dynahash
code a bit confusing, especially in how it mixes handling of non-shared
and shared hash tables. (Not the fault of this patch, ofc.)

> Please find attached the patch which removes the changes for non-shared
> hash tables
> and keeps them for shared hash tables.
> 
> I tested this by running make-check, make-check world and the reproducer
> script shared
> by David. I also ran pgbench to test creation and expansion of some of the
> shared hash tables.
> 

Thanks. I'm afraid the existing regression tests can't tell us much,
considering it didn't fail once with the reverted patch :-(

I did check the coverage in:

https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html

and sure enough, dir_realloc() is not executed once. And there's a
couple more pieces that have the same issue (e.g. hash_freeze or parts
of get_hash_entry).

I realize that's not the fault of this patch, but would you be willing
to improve that? It'd certainly make me less concerned if we improved
this first.


regards

-- 
Tomas Vondra



Reply via email to