On Wed, Apr 6, 2022 at 9:17 AM Yura Sokolov <y.soko...@postgrespro.ru> wrote: > I skipped v10 since I used it internally for variant > "insert entry with dummy index then search victim".
Hi, I think there's a big problem with this patch: --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -481,10 +481,10 @@ StrategyInitialize(bool init) * * Since we can't tolerate running out of lookup table entries, we must be * sure to specify an adequate table size here. The maximum steady-state - * usage is of course NBuffers entries, but BufferAlloc() tries to insert - * a new entry before deleting the old. In principle this could be - * happening in each partition concurrently, so we could need as many as - * NBuffers + NUM_BUFFER_PARTITIONS entries. + * usage is of course NBuffers entries. But due to concurrent + * access to numerous free lists in dynahash we can miss free entry that + * moved between free lists. So it is better to have some spare free entries + * to reduce probability of entry allocations after server start. */ InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); With the existing system, there is a hard cap on the number of hash table entries that we can ever need: one per buffer, plus one per partition to cover the "extra" entries that are needed while changing buffer tags. With the patch, the number of concurrent buffer tag changes is no longer limited by NUM_BUFFER_PARTITIONS, because you release the lock on the old buffer partition before acquiring the lock on the new partition, and therefore there can be any number of backends trying to change buffer tags at the same time. But that means, as the comment implies, that there's no longer a hard cap on how many hash table entries we might need. I don't think we can just accept the risk that the hash table might try to allocate after startup. If it tries, it might fail, because all of the extra shared memory that we allocate at startup may already have been consumed, and then somebody's query may randomly error out. That's not OK. It's true that very few users are likely to be affected, because most people won't consume the extra shared memory, and of those who do, most won't hammer the system hard enough to cause an error. However, I don't see us deciding that it's OK to ship something that could randomly break just because it won't do so very often. -- Robert Haas EDB: http://www.enterprisedb.com