Hi, On 2023-01-09 10:16:03 -0800, Peter Geoghegan wrote: > Attached is v3, which explicitly checks the need to set the PD_ALL_VISIBLE > flag at the relevant visibilitymap_set() call site. It also has improved > comments.
Afaict we'll need to backpatch this all the way? > From e7788ebdb589fb7c6f866cf53658cc369f9858b5 Mon Sep 17 00:00:00 2001 > From: Peter Geoghegan <p...@bowt.ie> > Date: Sat, 31 Dec 2022 15:13:01 -0800 > Subject: [PATCH v3] Don't accidentally unset all-visible bit in VM. > > One of the calls to visibilitymap_set() during VACUUM's initial heap > pass could unset a page's all-visible bit during the process of setting > the same page's all-frozen bit. As just mentioned upthread, this just seems wrong. > This could happen in the event of a > concurrent HOT update from a transaction that aborts soon after. Since > the all_visible_according_to_vm local variable that lazy_scan_heap works > off of when setting the VM doesn't reflect the current state of the VM, > and since visibilitymap_set() just requested that the all-frozen bit get > set in one case, there was a race condition. Heap pages could initially > be all-visible just as all_visible_according_to_vm is established, then > not be all-visible after the update, and then become eligible to be set > all-visible once more following pruning by VACUUM. There is no reason > why VACUUM can't remove a concurrently aborted heap-only tuple right > away, and so no reason why such a page won't be able to reach the > relevant visibilitymap_set() call site. Do you have a reproducer for this? > @@ -1120,8 +1123,8 @@ lazy_scan_heap(LVRelState *vacrel) > * got cleared after lazy_scan_skip() was called, so we must > recheck > * with buffer lock before concluding that the VM is corrupt. > */ > - else if (all_visible_according_to_vm && !PageIsAllVisible(page) > - && VM_ALL_VISIBLE(vacrel->rel, blkno, > &vmbuffer)) > + else if (all_visible_according_to_vm && !PageIsAllVisible(page) > && > + visibilitymap_get_status(vacrel->rel, blkno, > &vmbuffer) != 0) > { > elog(WARNING, "page is not marked all-visible but > visibility map bit is set in relation \"%s\" page %u", > vacrel->relname, blkno); Hm. The message gets a bit less accurate with the change. Perhaps OK? OTOH, it might be useful to know what bit was wrong when debugging problems. > @@ -1164,12 +1167,34 @@ lazy_scan_heap(LVRelState *vacrel) > !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)) > { > /* > - * We can pass InvalidTransactionId as the cutoff XID > here, > - * because setting the all-frozen bit doesn't cause > recovery > - * conflicts. > + * Avoid relying on all_visible_according_to_vm as a > proxy for the > + * page-level PD_ALL_VISIBLE bit being set, since it > might have > + * become stale -- even when all_visible is set in > prunestate. > + * > + * Consider the example of a page that starts out > all-visible and > + * then has a tuple concurrently deleted by an xact > that aborts. > + * The page will be all_visible_according_to_vm, and > will have > + * all_visible set in prunestate. It will nevertheless > not have > + * PD_ALL_VISIBLE set by here (plus neither VM bit will > be set). > + * And so we must check if PD_ALL_VISIBLE needs to be > set. > */ > + if (!PageIsAllVisible(page)) > + { > + PageSetAllVisible(page); > + MarkBufferDirty(buf); > + } > + > + /* > + * Set the page all-frozen (and all-visible) in the VM. > + * > + * We can pass InvalidTransactionId as our > visibility_cutoff_xid, > + * since a snapshotConflictHorizon sufficient to make > everything > + * safe for REDO was logged when the page's tuples were > frozen. > + */ > + > Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid)); > visibilitymap_set(vacrel->rel, blkno, buf, > InvalidXLogRecPtr, > vmbuffer, > InvalidTransactionId, > + > VISIBILITYMAP_ALL_VISIBLE | > > VISIBILITYMAP_ALL_FROZEN); > } > > @@ -1311,7 +1336,11 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, > BlockNumber next_block, > > /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */ > if (!vacrel->skipwithvm) > + { > + /* Caller shouldn't rely on all_visible_according_to_vm > */ > + *next_unskippable_allvis = false; > break; > + } > > /* > * Aggressive VACUUM caller can't skip pages just because they > are > @@ -1818,7 +1847,11 @@ retry: > * cutoff by stepping back from OldestXmin. > */ > if (prunestate->all_visible && prunestate->all_frozen) > + { > + /* Using same cutoff when setting VM is now > unnecessary */ > snapshotConflictHorizon = > prunestate->visibility_cutoff_xid; > + prunestate->visibility_cutoff_xid = > InvalidTransactionId; > + } > else > { > /* Avoids false conflicts when > hot_standby_feedback in use */ > @@ -2388,8 +2421,8 @@ lazy_vacuum_all_indexes(LVRelState *vacrel) > static void > lazy_vacuum_heap_rel(LVRelState *vacrel) > { > - int index; > - BlockNumber vacuumed_pages; > + int index = 0; > + BlockNumber vacuumed_pages = 0; > Buffer vmbuffer = InvalidBuffer; > LVSavedErrInfo saved_err_info; > > @@ -2406,42 +2439,42 @@ lazy_vacuum_heap_rel(LVRelState *vacrel) > > VACUUM_ERRCB_PHASE_VACUUM_HEAP, > InvalidBlockNumber, > InvalidOffsetNumber); > > - vacuumed_pages = 0; > - > - index = 0; > while (index < vacrel->dead_items->num_items) > { > - BlockNumber tblk; > + BlockNumber blkno; > Buffer buf; > Page page; > Size freespace; > > vacuum_delay_point(); I still think such changes are inappropriate for a bugfix, particularly one that needs to be backpatched. Greetings, Andres Freund