Hi, On 2022-08-17 10:18:14 +0530, Amit Kapila wrote: > > Looking at the non-recovery code makes me even more suspicious: > > > > /* > > * Physically allocate the new bucket's primary page. We want to > > do this > > * before changing the metapage's mapping info, in case we can't > > get the > > * disk space. Ideally, we don't need to check for cleanup lock on > > new > > * bucket as no other backend could find this bucket unless meta > > page is > > * updated. However, it is good to be consistent with old bucket > > locking. > > */ > > buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM); > > if (!IsBufferCleanupOK(buf_nblkno)) > > { > > _hash_relbuf(rel, buf_oblkno); > > _hash_relbuf(rel, buf_nblkno); > > goto fail; > > } > > > > > > _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which > > memset(0)s the whole page. What does it even mean to check whether you > > effectively have a cleanup lock after you zeroed out the page? > > > > Reading the README and the comment above makes me wonder if this whole > > cleanup > > lock business here is just cargo culting and could be dropped? > > > > I think it is okay to not acquire a clean-up lock on the new bucket > page both in recovery and non-recovery paths. It is primarily required > on the old bucket page to avoid concurrent scans/inserts. As mentioned > in the comments and as per my memory serves, it is mainly for keeping > it consistent with old bucket locking.
It's not keeping it consistent with bucket locking to zero out a page before getting a cleanup lock, hopefully at least. This code is just broken on multiple fronts, and consistency isn't a defense. Greetings, Andres Freund