On Tue, 3 Sept 2024 at 18:20, Tomas Vondra <to...@vondra.me> wrote: > FWIW the actual cost is somewhat higher, because we seem to need ~400B > for every lock (not just the 150B for the LOCK struct).
We do indeed allocate two PROCLOCKs for every LOCK, and allocate those inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being used[^1]. Does that align with your usage numbers, or are they significantly larger? > At least based on a quick experiment. (Seems a bit high, right?). Yeah, that does seem high, thanks for nerd-sniping me. The 152 bytes of LOCK are mostly due to a combination of two MAX_LOCKMODES-sized int[]s that are used to keep track of the number of requested/granted locks of each level. As MAX_LOCKMODES = 10, these arrays use a total of 2*4*10=80 bytes, with the remaining 72 spent on tracking. MAX_BACKENDS sadly doesn't fit in int16, so we'll have to keep using int[]s, but that doesn't mean we can't improve this size: ISTM that MAX_LOCKMODES is 2 larger than it has to be: LOCKMODE=0 is NoLock, which is never used or counted in these shared structures, and the max lock mode supported by any of the supported lock methods is AccessExclusiveLock (8). We can thus reduce MAX_LOCKMODES to 8, reducing size of the LOCK struct by 16 bytes. If some struct- and field packing is OK, then we could further reduce the size of LOCK by an additional 8 bytes by resizing the LOCKMASK type from int to int16 (we only use the first MaxLockMode (8) + 1 bits), and then storing the grant/waitMask fields (now 4 bytes total) in the padding that's present at the end of the waitProcs struct. This would depend on dclist not writing in its padding space, but I couldn't find any user that did so, and most critically dclist_init doesn't scribble in the padding with memset. If these are both implemented, it would save 24 bytes, reducing the struct to 128 bytes. :) [^2] I also checked PROCLOCK: If it is worth further packing the struct, we should probably look at whether it's worth replacing the PGPROC* typed fields with ProcNumber -based ones, potentially in both PROCLOCK and PROCLOCKTAG. When combined with int16-typed LOCKMASKs, either one of these fields being replaced with ProcNumber would allow a reduction in size by one MAXALIGN quantum, reducing the struct to 56 bytes, the smallest I could get it to without ignoring type alignments. Further shmem savings can be achieved by reducing the "10% safety margin" added at the end of LockShmemSize, as I'm fairly sure the memory used in shared hashmaps doesn't exceed the estimated amount, and if it did then we should probably fix that part, rather than requesting that (up to) 10% overhead here. Alltogether that'd save 40 bytes/lock entry on size, and ~35 bytes/lock on "safety margin", for a saving of (up to) 19% of our current allocation. I'm not sure if these tricks would benefit with performance or even be a demerit, apart from smaller structs usually being better at fitting better in CPU caches. Kind regards, Matthias van de Meent Neon (https://neon.tech) [^1] NLOCKENTS() benefits from being a power of 2, or slightly below one, as it's rounded up to a power of 2 when dynahash decides its number of buckets to allocate. [^2] Sadly this 2-cachelines alignment is lost due to dynahash's HASHELEMENT prefix of elements. :(