On Thu, Nov 16, 2017 at 2:16 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > e2c79e14 was to fix a pre-existing bug, but probably e9568083 made that bug > easier to hit than it was before. (Which is not to say that e9568083 can't > contain bugs of its own, of course)
I get that. I think that the same thing may have happened again, in fact. A pending list recycling interlock bug may have merely been unmasked by your commit e9568083; I'm speculating that your commit made it more likely to hit in practice, just as with the earlier bug you mentioned. As you know, there is a significant history of VACUUM page recycling bugs in GIN. I wouldn't be surprised if we found one more in the pending list logic. Consider commit ac4ab97e, a bugfix from late 2013 for posting list page deletion, where the current posting list locking protocol was more or less established. The commit message says: """ ... If a page is deleted, and reused for something else, just as a search is following a rightlink to it from its left sibling, the search would continue scanning whatever the new contents of the page are. That could lead to incorrect query results, or even something more curious if the page is reused for a different kind of a page. To fix, modify the search algorithm to lock the next page before releasing the previous one, and refrain from deleting pages from the leftmost branch of the [posting] tree. ... """ I believe that this commit had GIN avoid deletion of the leftmost branch of posting trees because that makes it safe to get to the posting tree root from a raw root block number (a raw block number from the B-tree index constructed over key values). The buffer lock pairing/crabbing this commit adds to posting lists (similar to the pairing/crabbing that pending lists had from day one, when first added in 2011) can prevent a concurrent deletion once you reach the posting tree root. But, as I said, you still need to reliably get to the root in the first place, which the "don't delete leftmost" rule ensures (if you can never delete it, you can never recycle it, and no reader gets confused). It's not clear why it's safe to recycle the pending list pages at all (though deletion alone might well be okay, because readers can notice a page is deleted if it isn't yet recycled). Why is it okay that we can jump to the first block from the meta page in the face of concurrent recycling? Looking at scanPendingInsert(), it does appear that readers do buffer lock crabbing/coupling for the meta page and the first pending page. So that's an encouraging sign. However, ginInsertCleanup() *also* uses shared buffer locks for both meta page and list head page (never exclusive buffer locks) in exactly the same way when first establishing what block is at the head of the list to be zapped. It's acting as if it is a reader, but it's actually a writer. I don't follow why this is correct. It looks like scanPendingInsert() can decide that it knows where the head of the pending list is *after* ginInsertCleanup() has already decided that that same head of the list block needs to possibly be deleted/recycled. Have I missed something? We really ought to make the asserts on gin page type into "can't happen" elog errors, as a defensive measure, in both scanPendingInsert() and ginInsertCleanup(). The 2013 bug fix actually did this for the posting tree, but didn't touch similar pending list code. -- Peter Geoghegan