On Thu, Apr 16, 2020 at 12:30 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > For btree indexes, IIRC skipping index cleanup could not be a cause of > corruption, but be a cause of index bloat since it leaves recyclable > pages which are not marked as recyclable.
I spotted a bug in "Skip full index scan during cleanup of B-tree indexes when possible" which is unrelated to the index cleanup issue. This code is wrong, because we don't have a buffer lock (or a buffer pin) on the buffer anymore: ndel = _bt_pagedel(rel, buf); /* count only this page, else may double-count parent */ if (ndel) { stats->pages_deleted++; if (!TransactionIdIsValid(vstate->oldestBtpoXact) || TransactionIdPrecedes(opaque->btpo.xact, vstate->oldestBtpoXact)) vstate->oldestBtpoXact = opaque->btpo.xact; } MemoryContextSwitchTo(oldcontext); /* pagedel released buffer, so we shouldn't */ (As the comment says, _bt_pagedel() releases it.) There is another, more fundamental issue, though: _bt_pagedel() can delete more than one page. That might be okay if the "extra" pages were always internal pages, but they're not -- it says so in the comments above _bt_pagedel(). See the code at the end of _bt_pagedel(), that says something about why we delete the right sibling page in some cases. I think that the fix is to push down the vstate into lower level code in nbtpage.c. Want to have a go at fixing it? (It would be nice if we could teach Valgrind to "poison" buffers when we don't have a pin held...that would probably have caught this issue almost immediately.) -- Peter Geoghegan