On Thu, Apr 29, 2021 at 1:22 PM Peter Geoghegan <p...@bowt.ie> wrote:
>
> On Wed, Apr 28, 2021 at 7:34 PM tanghy.f...@fujitsu.com
> <tanghy.f...@fujitsu.com> wrote:
> > >TRAP: FailedAssertion("!all_visible_according_to_vm || 
> > >prunestate.all_visible", File: "vacuumlazy.c", Line: 1347, PID: 1274675)
> >
> > BTW, in asynchronous mode, the same problem can also happen but in a low 
> > frequency.(I tried many times, but the problem happened only 2 times)
> > As for synchronous mode, I found it seems easier to reproduce the problem 
> > with setting "autovacuum_naptime = 1".
> > But it still can't be 100% to reproduced it. (I tested it 5 times, 3 of 
> > them reproduced it.)

Thanks for reporting the issue!

>
> Is setting all_visible_according_to_vm false as below enough to avoid
> the assertion failure?
>
> diff --git a/src/backend/access/heap/vacuumlazy.c
> b/src/backend/access/heap/vacuumlazy.c
> index c3fc12d76c..76c17e063e 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1146,6 +1146,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
> *params, bool aggressive)
>              {
>                  ReleaseBuffer(vmbuffer);
>                  vmbuffer = InvalidBuffer;
> +                all_visible_according_to_vm = false;
>              }
>
>              /* Remove the collected garbage tuples from table and indexes */

Since we set all_visible_according_to_vm before acquiring the buffer
lock it's likely to happen that the page gets modified and all-visible
bit is cleared after setting true to all_visible_according_to_vm. This
assertion can easily be reproduced by adding a delay before the buffer
lock and invoking autovacuums frequently:

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index c3fc12d76c..76f067a7e4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1180,6 +1180,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
        buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno,
                                 RBM_NORMAL, vacrel->bstrategy);

+       pg_usleep(100000);
+
        /*
         * We need buffer cleanup lock so that we can prune HOT chains and
         * defragment the page.

So we should recheck also visibility map bit there but I think we can
remove this assertion since we already do that in later code and we
don’t treat this case as a should-not-happen case:

        /*
         * As of PostgreSQL 9.2, the visibility map bit should never be set if
         * the page-level bit is clear.  However, it's possible that the bit
         * got cleared after we checked it and before we took the buffer
         * content lock, so we must recheck before jumping to the conclusion
         * that something bad has happened.
         */
        else if (all_visible_according_to_vm && !PageIsAllVisible(page)
                 && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
        {
            elog(WARNING, "page is not marked all-visible but
visibility map bit is set in relation \"%s\" page %u",
                 vacrel->relname, blkno);
            visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }

I've attached a patch removing the assertion.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c3fc12d76c..47ac6385d1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1344,7 +1344,6 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		lazy_scan_prune(vacrel, buf, blkno, page, vistest, &prunestate);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-		Assert(!all_visible_according_to_vm || prunestate.all_visible);
 
 		/* Remember the location of the last page with nonremovable tuples */
 		if (prunestate.hastup)

Reply via email to