Fixing a couple of buglets in how VACUUM sets visibility map bits

Peter Geoghegan Sat, 31 Dec 2022 16:54:16 -0800

Attached are two patches, each of which fixes two historical buglets
around VACUUM's approach to setting bits in the visibility map.
(Whether or not this is actually refactoring work or hardening work is
debatable, I suppose.)


The first patch makes sure that the snapshotConflictHorizon cutoff
(XID cutoff for recovery conflicts) is never a special XID, unless
that XID is InvalidTransactionId, which is interpreted as a
snapshotConflictHorizon value that will never need a recovery conflict
(per the general convention for snapshotConflictHorizon values
explained above ResolveRecoveryConflictWithSnapshot). This patch
establishes a hard rule that snapshotConflictHorizon values can never
be a special XID value, unless it's InvalidTransactionId. An assertion
enforces the rule for us in REDO routines (at the point that they call
ResolveRecoveryConflictWithSnapshot with the WAL record's
snapshotConflictHorizon XID value).

The second patch makes sure that VACUUM can never set a page
all-frozen in the visibility map without also setting the same page
all-visible in the same call to visibilitymap_set() -- regardless of
what we think we know about the current state of the all-visible bit
in the VM.

The second patch adjusts one of the visibilitymap_set() calls in
vacuumlazy.c that would previously sometimes set a page's all-frozen
bit without also setting its all-visible bit. This could allow VACUUM
to leave a page all-frozen but not all-visible in the visibility map
(since the value of all_visible_according_to_vm can go stale). I think
that this should be treated as a basic visibility map invariant: an
all-frozen page must also be all-visible, by definition, so why should
it be physically possible for the VM to give a contradictory picture
of the all_visible/all_frozen status of any one page? Assertions are
added that more or less make this rule into an invariant.
amcheck/pg_visibility coverage might make sense too, but I haven't
done that here.

The second patch also adjusts a later visibilitymap_set() call site
(the one used just after heap vacuuming runs in the final heap pass)
in roughly the same way. It no longer reads from the visibility map to
see what bits need to be changed. The existing approach here seems
rather odd. The whole point of calling lazy_vacuum_heap_page() is to
set LP_DEAD items referenced by VACUUM's dead_items array to LP_UNUSED
-- there has to have been at least one LP_DEAD item on the page for us
to end up here (which a Postgres 14 era assertion verifies for us). So
we already know perfectly well that the visibility map shouldn't
indicate that the page is all-visible yet -- why bother asking the VM?
And besides, any call to visibilitymap_set() will only modify the VM
when it directly observes that the bits have changed -- so why even
attempt to duplicate that on the caller side?

It seems to me that the visibilitymap_get_status() call inside
lazy_vacuum_heap_page() is actually abused to work as a substitute for
visibilitymap_pin(). Why not use top-level visibilitymap_pin() calls
instead, just like we do it in the first heap pass? That's how it's
done in the second patch; it adds a visibilitymap_pin() call in
lazy_vacuum_heap_rel()'s blkno-wise loop. That gives us parity between
the first and second heap pass, which seems like a clear
maintainability win -- everybody can pass the
already-pinned/already-setup vmbuffer by value.

-- 
Peter Geoghegan

v1-0001-Avoid-special-XIDs-in-snapshotConflictHorizon-val.patch
Description: Binary data

v1-0002-Never-just-set-the-all-frozen-bit-in-VM.patch
Description: Binary data

Fixing a couple of buglets in how VACUUM sets visibility map bits

Reply via email to