Re: BufferAlloc: don't take two simultaneous locks

Robert Haas Thu, 21 Apr 2022 13:24:57 -0700

On Thu, Apr 21, 2022 at 5:04 AM Yura Sokolov <y.soko...@postgrespro.ru> wrote:
>   $ pid=`ps x | awk '/checkpointer/ && !/awk/ { print $1 }'`
>   $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value'
>
>   $1 = 16512
>
>   $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres
>   ...
>   $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value'
>
>   $1 = 20439
>
>   $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres
>   ...
>   $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value'
>
>   $1 = 20541
>
> It stabilizes at 20541


Hmm. So is the existing comment incorrect? Remember, I was complaining
about this change:

--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -481,10 +481,10 @@ StrategyInitialize(bool init)
  *
  * Since we can't tolerate running out of lookup table entries, we must be
  * sure to specify an adequate table size here.  The maximum steady-state
- * usage is of course NBuffers entries, but BufferAlloc() tries to insert
- * a new entry before deleting the old.  In principle this could be
- * happening in each partition concurrently, so we could need as many as
- * NBuffers + NUM_BUFFER_PARTITIONS entries.
+ * usage is of course NBuffers entries. But due to concurrent
+ * access to numerous free lists in dynahash we can miss free entry that
+ * moved between free lists. So it is better to have some spare free entries
+ * to reduce probability of entry allocations after server start.
  */
  InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);

Pre-patch, the comment claims that the maximum number of buffer
entries that can be simultaneously used is limited to NBuffers +
NUM_BUFFER_PARTITIONS, and that's why we make the hash table that
size. The idea is that we normally need more than 1 entry per buffer,
but sometimes we might have 2 entries for the same buffer if we're in
the process of changing the buffer tag, because we make the new entry
before removing the old one. To change the buffer tag, we need the
buffer mapping lock for the old partition and the new one, but if both
are the same, we need only one buffer mapping lock. That means that in
the worst case, you could have a number of processes equal to
NUM_BUFFER_PARTITIONS each in the process of changing the buffer tag
between values that both fall into the same partition, and thus each
using 2 entries. Then you could have every other buffer in use and
thus using 1 entry, for a total of NBuffers + NUM_BUFFER_PARTITIONS
entries. Now I think you're saying we go far beyond that number, and
what I wonder is how that's possible. If the system doesn't work the
way the comment says it does, maybe we ought to start by talking about
what to do about that.

I am a bit confused by your description of having done "p
SharedBufHash->hctl->allocated.value" because SharedBufHash is of type
HTAB and HTAB's hctl member is of type HASHHDR, which has no field
called "allocated". I thought maybe my analysis here was somehow
mistaken, so I tried the debugger, which took the same view of it that
I did:

(lldb) p SharedBufHash->hctl->allocated.value
error: <user expression 0>:1:22: no member named 'allocated' in 'HASHHDR'
SharedBufHash->hctl->allocated.value
~~~~~~~~~~~~~~~~~~~  ^

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: BufferAlloc: don't take two simultaneous locks

Reply via email to