Hi,

On 2023-01-09 10:16:03 -0800, Peter Geoghegan wrote:
> Attached is v3, which explicitly checks the need to set the PD_ALL_VISIBLE
> flag at the relevant visibilitymap_set() call site. It also has improved
> comments.

Afaict we'll need to backpatch this all the way?


> From e7788ebdb589fb7c6f866cf53658cc369f9858b5 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <p...@bowt.ie>
> Date: Sat, 31 Dec 2022 15:13:01 -0800
> Subject: [PATCH v3] Don't accidentally unset all-visible bit in VM.
> 
> One of the calls to visibilitymap_set() during VACUUM's initial heap
> pass could unset a page's all-visible bit during the process of setting
> the same page's all-frozen bit.

As just mentioned upthread, this just seems wrong.


> This could happen in the event of a
> concurrent HOT update from a transaction that aborts soon after.  Since
> the all_visible_according_to_vm local variable that lazy_scan_heap works
> off of when setting the VM doesn't reflect the current state of the VM,
> and since visibilitymap_set() just requested that the all-frozen bit get
> set in one case, there was a race condition.  Heap pages could initially
> be all-visible just as all_visible_according_to_vm is established, then
> not be all-visible after the update, and then become eligible to be set
> all-visible once more following pruning by VACUUM.  There is no reason
> why VACUUM can't remove a concurrently aborted heap-only tuple right
> away, and so no reason why such a page won't be able to reach the
> relevant visibilitymap_set() call site.

Do you have a reproducer for this?


> @@ -1120,8 +1123,8 @@ lazy_scan_heap(LVRelState *vacrel)
>                * got cleared after lazy_scan_skip() was called, so we must 
> recheck
>                * with buffer lock before concluding that the VM is corrupt.
>                */
> -             else if (all_visible_according_to_vm && !PageIsAllVisible(page)
> -                              && VM_ALL_VISIBLE(vacrel->rel, blkno, 
> &vmbuffer))
> +             else if (all_visible_according_to_vm && !PageIsAllVisible(page) 
> &&
> +                              visibilitymap_get_status(vacrel->rel, blkno, 
> &vmbuffer) != 0)
>               {
>                       elog(WARNING, "page is not marked all-visible but 
> visibility map bit is set in relation \"%s\" page %u",
>                                vacrel->relname, blkno);

Hm. The message gets a bit less accurate with the change. Perhaps OK? OTOH, it
might be useful to know what bit was wrong when debugging problems.


> @@ -1164,12 +1167,34 @@ lazy_scan_heap(LVRelState *vacrel)
>                                !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
>               {
>                       /*
> -                      * We can pass InvalidTransactionId as the cutoff XID 
> here,
> -                      * because setting the all-frozen bit doesn't cause 
> recovery
> -                      * conflicts.
> +                      * Avoid relying on all_visible_according_to_vm as a 
> proxy for the
> +                      * page-level PD_ALL_VISIBLE bit being set, since it 
> might have
> +                      * become stale -- even when all_visible is set in 
> prunestate.
> +                      *
> +                      * Consider the example of a page that starts out 
> all-visible and
> +                      * then has a tuple concurrently deleted by an xact 
> that aborts.
> +                      * The page will be all_visible_according_to_vm, and 
> will have
> +                      * all_visible set in prunestate.  It will nevertheless 
> not have
> +                      * PD_ALL_VISIBLE set by here (plus neither VM bit will 
> be set).
> +                      * And so we must check if PD_ALL_VISIBLE needs to be 
> set.
>                        */
> +                     if (!PageIsAllVisible(page))
> +                     {
> +                             PageSetAllVisible(page);
> +                             MarkBufferDirty(buf);
> +                     }
> +
> +                     /*
> +                      * Set the page all-frozen (and all-visible) in the VM.
> +                      *
> +                      * We can pass InvalidTransactionId as our 
> visibility_cutoff_xid,
> +                      * since a snapshotConflictHorizon sufficient to make 
> everything
> +                      * safe for REDO was logged when the page's tuples were 
> frozen.
> +                      */
> +                     
> Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
>                       visibilitymap_set(vacrel->rel, blkno, buf, 
> InvalidXLogRecPtr,
>                                                         vmbuffer, 
> InvalidTransactionId,
> +                                                       
> VISIBILITYMAP_ALL_VISIBLE     |
>                                                         
> VISIBILITYMAP_ALL_FROZEN);
>               }
>  
> @@ -1311,7 +1336,11 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, 
> BlockNumber next_block,
>  
>               /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
>               if (!vacrel->skipwithvm)
> +             {
> +                     /* Caller shouldn't rely on all_visible_according_to_vm 
> */
> +                     *next_unskippable_allvis = false;
>                       break;
> +             }
>  
>               /*
>                * Aggressive VACUUM caller can't skip pages just because they 
> are
> @@ -1818,7 +1847,11 @@ retry:
>                        * cutoff by stepping back from OldestXmin.
>                        */
>                       if (prunestate->all_visible && prunestate->all_frozen)
> +                     {
> +                             /* Using same cutoff when setting VM is now 
> unnecessary */
>                               snapshotConflictHorizon = 
> prunestate->visibility_cutoff_xid;
> +                             prunestate->visibility_cutoff_xid = 
> InvalidTransactionId;
> +                     }
>                       else
>                       {
>                               /* Avoids false conflicts when 
> hot_standby_feedback in use */
> @@ -2388,8 +2421,8 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
>  static void
>  lazy_vacuum_heap_rel(LVRelState *vacrel)
>  {
> -     int                     index;
> -     BlockNumber vacuumed_pages;
> +     int                     index = 0;
> +     BlockNumber vacuumed_pages = 0;
>       Buffer          vmbuffer = InvalidBuffer;
>       LVSavedErrInfo saved_err_info;
>  
> @@ -2406,42 +2439,42 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
>                                                        
> VACUUM_ERRCB_PHASE_VACUUM_HEAP,
>                                                        InvalidBlockNumber, 
> InvalidOffsetNumber);
>  
> -     vacuumed_pages = 0;
> -
> -     index = 0;
>       while (index < vacrel->dead_items->num_items)
>       {
> -             BlockNumber tblk;
> +             BlockNumber blkno;
>               Buffer          buf;
>               Page            page;
>               Size            freespace;
>  
>               vacuum_delay_point();

I still think such changes are inappropriate for a bugfix, particularly one
that needs to be backpatched.

Greetings,

Andres Freund


Reply via email to