On Thu, Apr 21, 2022 at 5:04 AM Yura Sokolov <y.soko...@postgrespro.ru> wrote: > $ pid=`ps x | awk '/checkpointer/ && !/awk/ { print $1 }'` > $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value' > > $1 = 16512 > > $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres > ... > $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value' > > $1 = 20439 > > $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres > ... > $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value' > > $1 = 20541 > > It stabilizes at 20541
Hmm. So is the existing comment incorrect? Remember, I was complaining about this change: --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -481,10 +481,10 @@ StrategyInitialize(bool init) * * Since we can't tolerate running out of lookup table entries, we must be * sure to specify an adequate table size here. The maximum steady-state - * usage is of course NBuffers entries, but BufferAlloc() tries to insert - * a new entry before deleting the old. In principle this could be - * happening in each partition concurrently, so we could need as many as - * NBuffers + NUM_BUFFER_PARTITIONS entries. + * usage is of course NBuffers entries. But due to concurrent + * access to numerous free lists in dynahash we can miss free entry that + * moved between free lists. So it is better to have some spare free entries + * to reduce probability of entry allocations after server start. */ InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); Pre-patch, the comment claims that the maximum number of buffer entries that can be simultaneously used is limited to NBuffers + NUM_BUFFER_PARTITIONS, and that's why we make the hash table that size. The idea is that we normally need more than 1 entry per buffer, but sometimes we might have 2 entries for the same buffer if we're in the process of changing the buffer tag, because we make the new entry before removing the old one. To change the buffer tag, we need the buffer mapping lock for the old partition and the new one, but if both are the same, we need only one buffer mapping lock. That means that in the worst case, you could have a number of processes equal to NUM_BUFFER_PARTITIONS each in the process of changing the buffer tag between values that both fall into the same partition, and thus each using 2 entries. Then you could have every other buffer in use and thus using 1 entry, for a total of NBuffers + NUM_BUFFER_PARTITIONS entries. Now I think you're saying we go far beyond that number, and what I wonder is how that's possible. If the system doesn't work the way the comment says it does, maybe we ought to start by talking about what to do about that. I am a bit confused by your description of having done "p SharedBufHash->hctl->allocated.value" because SharedBufHash is of type HTAB and HTAB's hctl member is of type HASHHDR, which has no field called "allocated". I thought maybe my analysis here was somehow mistaken, so I tried the debugger, which took the same view of it that I did: (lldb) p SharedBufHash->hctl->allocated.value error: <user expression 0>:1:22: no member named 'allocated' in 'HASHHDR' SharedBufHash->hctl->allocated.value ~~~~~~~~~~~~~~~~~~~ ^ -- Robert Haas EDB: http://www.enterprisedb.com