On 18/11/2024 02:01, Jeff Davis wrote:
TupleHashEntryData is the size of a hash bucket, and it's currently 24
bytes. The size of the structure is important for HashAgg and other
callers that can build up large hashtables of small tuples. Waste in
this structure is even worse when the hash table is sparse, because the
space is consumed whether the bucket is occupied or not.
The attached patch series brings it down to 12. There are several
unappealing hacks, so ideally we'd find something cleaner, but halving
the bucket size seems quite desirable if it can be done in an
acceptable way.
Makes sense.
0001: This patch makes the structure private, so that it's easier to
control the way the structure is accessed.
Seems pretty uncontroversial. You removed the typedef for struct
TupleHashEntryData, which is a bit unusual for our usual source style.
Was there a reason for that?
0002: Removes the "additional" pointer, instead storing it right after
the tuple, which is already stored in a separate chunk. Hack: this
crams the "additional" pointer after the MinimalTuple in the same chunk
of memory to avoid adding additional palloc()s.
Hmm, it would seem more straightforward to store it in the beginning,
i.e. have something like this:
struct {
void *additional;
MinimalTupleData mtup;
} ;
Come to think of it, how important is it that we use MinimalTuple here
at all? Some other representation could be faster to deal with in
TupleHashTableMatch() anyway.
0003: simplehash: allow the caller to decide which entries are empty vs
in-use rather than requiring a separate "status" field. This may limit
other possible status values in the future (i.e. adding on to the
enum), but I'm not sure what those other stutus values might be.
+1. I've wanted to have this in the past.
0004: Removes the "status" field from TupleHashEntryData, using
firstTuple==NULL to mean "empty", otherwise "in use". Hack: need an
additional "special" pointer value to mean "input slot" now that NULL
means "empty".
+1
0005: Pack TupleHashEntryData. IIUC, this is fine even on machines that
can't do unaligned access, so long as we are accessing the fields
through the struct, and not taking the address of individual members.
Seems OK.
I wonder if the compiler understands that the elements are still 4-byte
aligned, or if it forces byte-per-byte access? Playing with godbolt a
little, it seems like GCC at least understands it, but clang does not.
On architectures with non-strict alignment, it doesn't matter as a
simple load/store instruction is the fastest option anyway.
--
Heikki Linnakangas
Neon (https://neon.tech)