> On Mar 3, 2026, at 07:38, Melanie Plageman <[email protected]> wrote:
>
> On Fri, Feb 20, 2026 at 12:59 PM Andres Freund <[email protected]> wrote:
>>
>> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
>>
>>> I could see an argument for moving identify_and_fix_vm_corruption()
>>> out of the helper and into heap_page_prune_and_freeze() but then we'd
>>> have to move visibilitymap_get_status() out too. And that takes away a
>>> lot of the benefit of encapsulating all that logic.
>>
>> I was wondering about that option. Relatedly, I also was wondering if we
>> ought
>> to do identify_and_fix_vm_corruption() regardless of ->attempt_update_vm.
>
> Attached v35 does this. I always pin the vmbuffer if we are going to
> prune in heap_page_prune_opt(). In many cases, because it's saved in
> the scan descriptor, it won't actually need to take a new pin. During
> pruning, I check for VM corruption even if I am not considering
> setting the VM.
>
>>> Well, after this patch set, clearing the VM does happen before we emit
>>> WAL for pruning.
>>
>> That I think is a substantial improvement, the current (i.e. before your
>> series) placement really is pretty insane due to the guaranteed divergence it
>> causes.
>>
>> I wonder if we actually should just force an FPI whenever we detect such
>> corruption, that way it would reliably fixed on the standby as well.
>
> Only problem is we would have to do an FPI of the VM page as well if
> we wanted the corruption to be reliably fixed on the standby.
>
>>> It wouldn't be hard to move the corruption fixups to the beginning of
>>> heap_page_prune_and_freeze() in the new code structure.
>>
>> As identify_and_fix_vm_corruption() needs lpdead_items, I'm not sure that's
>> true?
>>
>> I wonder if at least the warning for the "(PageIsAllVisible(heap_page) &&
>> nlpdead_items > 0)" test should be moved to
>> heap_prune_record_dead_or_unused(). That way the WARNING could include the
>> offset number and it'd also work in the mark_unused_now case.
>>
>> Perhaps it also should trigger for RECENTLY_DEAD, INSERT_IN_PROGRESS,
>> DELETE_IN_PROGRESS?
>>
>> At that point the !page_all_visible && vm_all_visible part could indeed be
>> moved to the start of heap_page_prune_and_freeze()
>
> I've done all this. There is heap page/VM corruption check at the
> beginning of heap_page_prune_and_freeze() and then checking for
> corruption during pruning in the previously covered case (lpdead
> items) as well as the mark_unused_now case, and
> RECENTLY_DEAD/INSERT_IN_PROGRESS/DELETE_IN_PROGRESS.
>
>>> Would it be worth it? What benefit would we get? Do you just feel that it
>>> should logically come first?
>>
>> One insanity is that right now we will process all frozen pages over and over
>> due to he skip pages threshold, wasting a *lot* of CPU and memory bandwidth.
>> It'd be quite defensible to just skip processing the page once we determined
>> it's already all frozen. But for that we'd probably want to do the
>> "page_all_visible && vm_all_visible" check before returning...
>
> I've added a fast path to bypass pruning/freezing when the page is
> already all-visible. And I check for pg_all_visible && vm_all_visible
> beforehand. The one downside this has is if there is a page marked
> all-frozen but has dead tuples on it, we'll never get to fix that
> corruption nor clean up the dead tuples. But the fast path kind of
> seems worth it to me.
>
>>>> Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
>>>> HEAP_PAGE_PRUNE_UPDATE_VM would be set?
>>>
>>> Yes, when setting the VM on-access, it is too expensive to call
>>> heap_prepare_freeze_tuple() on each tuple. I could work on trying to
>>> optimize it, but it isn't currently viable.
>>
>> Is it too expensive to do so even when we already decided to do some pruning?
>> I am not surprised it's too expensive when there's not even a dead tuple on
>> the page. But I am mildly surprised if it's too expensive to do when we'd
>> WAL
>> log anyway?
>
> It's not really possible in the current code structure to only call
> heap_prepare_freeze_tuple() when there are at least some prunable
> tuples. We go through the line pointers and record them as prunable at
> the same time we call heap_prepare_freeze_tuple(), so we won't know
> until we've examined all line pointers that there are no prunable
> tuples, at which point we will have called heap_prepare_freeze_tuple()
> for every tuple.
>
>>> I think using all_frozen_except_dead while maintaining
>>> visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
>>> the potential to be confusing, though. We'd need to keep updating
>>> visibility_cutoff_xid when all_visible is false but
>>> all_frozen_except_dead is true as well as when all_visible is true.
>>> And because we don't care about all_visible_except_dead, it gets even
>>> more confusing to make sure we are maintaining the right variables in
>>> the right situations.
>>
>> I suspect we should just track all of the horizons/cutoffs all the time. This
>> whole stuff about optimizing out a few conditional assignments complicates
>> the
>> code substantially and feels extremely error prone to me.
>
> I've done this in v35. I posted the freeze horizon tracking patch
> separately in [1] but it is in v35 as 0004. Tracking the newest live
> xid is in 0009. This also always tracks all_visible for all callers
> since I unconditionally pass the vmbuffer now. I still don't set the
> VM if the query is modifying the relation, though.
>
>> I probably complained about this before, and it's not this patch's fault, but
>> PruneState->{all_visible,all_frozen} are imo confusingly named, due to
>> sounding like they describe the current state, rather than the possible state
>> after pruning. It's not helped by this comment:
>>
>> * NOTE: all_visible and all_frozen initially don't include LP_DEAD
>> items.
>> * That's convenient for heap_page_prune_and_freeze() to use them to
>> * decide whether to opportunistically freeze the page or not. The
>> * all_visible and all_frozen values ultimately used to set the VM are
>> * adjusted to include LP_DEAD items after we determine whether or
>> not to
>> * opportunistically freeze.
>>
>> "all-visible ... are adjusted to include LP_DEAD" ... - just reading that
>> it's
>> hard to know what it means.
>
> 0003 does the rename.
>
>> The first thing to improve pruning performance that I would do is to
>> introduce
>> a fastpath for pages that a) area already frozen b) do not have dead items
>> (if
>> we're not freezing). Iterating through HOT chains is far from cheap, and if
>> all rows are live, there's not really a point in doing so. This is
>> particulary important for VACUUMs where we end up freezing a ton of pages
>> that
>> are already frozen, due to the silly skip_pages_threshold thing.
>
> 0007 adds a fast path.
>
>>> +static TransactionId
>>> +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
>>> + uint8 old_vmbits, uint8 new_vmbits,
>>> + TransactionId latest_xid_removed,
>>> TransactionId frz_conflict_horizon,
>>> + TransactionId visibility_cutoff_xid)
>>> +{
>>> + TransactionId conflict_xid;
>>> +
>>> + /*
>>> + * We can omit the snapshot conflict horizon if we are not pruning or
>>> + * freezing any tuples and are setting an already all-visible page
>>> + * all-frozen in the VM.
>>
>> Maybe mention when this can happen, because it's not immediately obvious.
>
> I've added this to my TODO. I honestly can't think of a scenario where
> it can happen. But I remember spending quite a bit of time thinking
> about it on another occasion. The current code (in master) does
> specifically account for this scenario, which is why I kept the logic,
> but I'm not sure how it can happen.
>
> I made all the other changes to specific comments you mentioned in
> your mail but I won't bore you with itemization.
>
>>> if (do_set_vm)
>>> conflict_xid = visibility_cutoff_xid;
>>> else if (do_freeze)
>>> conflict_xid = frz_conflict_horizon;
>>> else
>>> conflict_xid = InvalidTransactionId;
>>
>> Could it be worth checking that if (do_set_vm && do_freeze) the
>> frz_conflict_horizon won't "violated" by using visibility_cutoff_xid instead?
>
> Yes, as you mentioned off-list, this wasn't right. New code is like this
>
> TransactionId conflict_xid = InvalidTransactionId;
> ...
> if (do_set_vm)
> conflict_xid = newest_live_xid;
> if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
> conflict_xid = newest_frozen_xid;
>
>>> From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
>>> From: Melanie Plageman <[email protected]>
>>> Date: Wed, 17 Dec 2025 16:51:05 -0500
>>> Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
>>> level visibility
>>>
>>> [...]
>>>
>>> Because comparing a transaction ID against GlobalVisState is more
>>> expensive than comparing against a single XID, we defer this check until
>>> after scanning all tuples on the page.
>>
>> Curious, is this a precaution or was this a measurable bottleneck?
>
> I did see GlobalVisTestXidMaybeRunning() in a profile I did when it
> was still called for every HEAPTUPLE_LIVE tuple in
> heap_prune_record_unchanged_lp_normal(), but I don't have the profile
> or test case around anymore.
>
> However, since I now unconditionally maintain the newest_live_xid,
> moving GlobalVisTestXidMaybeRunning() back into
> heap_prune_record_unchanged_lp_normal() wouldn't help us avoid any
> work. It would just make the values of prstate.set_all_visible and
> prstate.set_all_frozen more accurate sooner. But I don't think it's
> worth the extra function call since set_all_frozen and set_all_visible
> won't be totally "done" until after we decide whether or not to
> opportunistically freeze anyway.
>
>>> @@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>>> prune_freeze_plan(RelationGetRelid(params->relation),
>>> buffer, &prstate, off_loc);
>>>
>>> + /*
>>> + * After processing all the live tuples on the page, if the newest
>>> xmin
>>> + * amongst them may be considered running by any snapshot, the page
>>> cannot
>>> + * be all-visible.
>>> + */
>>> + if (prstate.all_visible &&
>>> + TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
>>
>> Any reason to test IsNormal rather than just IsValid()? There should never
>> be
>> a reason it's a valid but not "normal" xid, right?
>
> Well the reason I did this was that the existing code in master
> tracking visibility_cutoff_xid only advances it if
> TransactionIdIsNormal(). I'm a bit confused about it too because it
> seems like we would still want to do it for bootstrap mode xids. But I
> see PageSetPrunable() only allows normal xids.
>
>>> @@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page,
>>> PruneState *prstate, OffsetNumb
>>> }
>>>
>>> /*
>>> - * The inserter definitely committed. But is
>>> it old enough
>>> - * that everyone sees it as committed? A
>>> FrozenTransactionId
>>> - * is seen as committed to everyone.
>>> Otherwise, we check if
>>> - * there is a snapshot that considers this
>>> xid to still be
>>> - * running, and if so, we don't consider the
>>> page all-visible.
>>> + * The inserter definitely committed. But we
>>> don't know if it
>>> + * is old enough that everyone sees it as
>>> committed. Later,
>>> + * after processing all the tuples on the
>>> page, we'll check if
>>> + * there is any snapshot that still considers
>>> the newest xid
>>> + * on the page to be running. If so, we don't
>>> consider the
>>> + * page all-visible.
>>> */
>>> xmin = HeapTupleHeaderGetXmin(htup);
>>>
>>> - /*
>>> - * For now always use prstate->cutoffs for
>>> this test, because
>>> - * we only update 'all_visible' and
>>> 'all_frozen' when freezing
>>> - * is requested. We could use
>>> GlobalVisTestIsRemovableXid
>>> - * instead, if a non-freezing caller wanted
>>> to set the VM bit.
>>> - */
>>> - Assert(prstate->cutoffs);
>>> - if (!TransactionIdPrecedes(xmin,
>>> prstate->cutoffs->OldestXmin))
>>> - {
>>> - prstate->all_visible = false;
>>> - prstate->all_frozen = false;
>>> - break;
>>> - }
>>> -
>>> /* Track newest xmin on page. */
>>> if (TransactionIdFollows(xmin,
>>> prstate->visibility_cutoff_xid) &&
>>> TransactionIdIsNormal(xmin))
>>
>> Kinda wonder if this cod eshould be in something like
>> heap_prune_record_freezable() or such, rather than be inside
>> heap_prune_record_unchanged_lp_normal().
>
> I played around with it, but it all felt a bit awkward. I wrote it
> down for a future enhancement idea.
>
>>> Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
>>>
>>> In the prune/freeze path, we currently delay clearing all_visible and
>>> all_frozen in the presence of dead items to allow opportunistic
>>> freezing.
>>>
>>> However, if no freezing will be attempted, there’s no need to delay.
>>> Clearing the flags earlier avoids extra bookkeeping in
>>> heap_prune_record_unchanged_lp_normal(). This currently has no runtime
>>> effect because all callers that consider setting the VM also prepare
>>> freeze plans, but upcoming changes will allow on-access pruning to set
>>> the VM without freezing. The extra bookkeeping was noticeable in a
>>> profile of on-access VM setting.
>>
>> What workload was that?
>
> It was a select * offset all query with a few fat tuples on each page
> and none of them prunable. I'm planning on digging up the
> case/creating a new one to see if it is reproducible. This was with an
> older version of the code that had more conditionals as well. This
> commit is actually dropped in v35 because I now always keep
> newest_live_xid up-to-date (0009) which means unsetting
> set_all_visible sooner has no benefit.
>
>> Theoretically, even if we don't freeze, the page still may be all-visible or
>> all frozen after the removal of dead items, no? Practically that won't
>> happen,
>> because we don't remove dead items in any of the relevant paths, but from the
>> commit message and comments that's not entirely clear.
>
> Yea, it's clearer with the commit dropped.
>
>>> @@ -678,6 +678,12 @@ typedef struct EState
>>> *
>>> ExecDoInitialPruning() */
>>> const char *es_sourceText; /* Source text from QueryDesc */
>>>
>>> + /*
>>> + * RT indexes of relations modified by the query through a
>>> + * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
>>> + */
>>> + Bitmapset *es_modified_relids;
>>> +
>>
>> Other EState fields are initialized in CreateExecutorState, this isn't
>> afaict?
>
> Oops, yes. I based it on es_unpruned_relids which wasn't initialized
> there either. I've added a commit (0013) to initialize a few EState
> fields that weren't initialized in CreateExecutorState() as well.
>
>> Wonder if it's worth adding a crosscheck somewhere, verifying that if a
>> relation is modified, it's in es_modified_relids. Otherwise this could very
>> well silently get out of date.
>
> Done in v35 (0014).
>
>> Also, there's some overlap between the informtion collected this way, and
>> AcquireExecutorLocks(), ScanQueryForLocks(), which determine the needed lock
>> modes via rte->rellockmode.
>
> Those are in parser/planner, so it doesn't seem like a good fit. I
> populate es_modified_relids in the executor.
>
> I don't know exactly what the overlap would be between RTEs with an
> exclusive rellockmode and es_modified_relids. It seems like you could
> have RTEs which don't end up getting modified that have a lock level
> that would have made you think that they would be modified.
>
> But were you imagining a substitution or a cross-check?
>
>>> From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
>>> From: Melanie Plageman <[email protected]>
>>> Date: Wed, 3 Dec 2025 15:12:18 -0500
>>> Subject: [PATCH v34 12/14] Pass down information on table modification to
>>> scan
>>> node
>>
>> Perhaps worth splitting up, so the addition of the 0 flag is separate from
>> the
>> the read only hint aspect.
>
> Done.
>
> [1]
> https://www.postgresql.org/message-id/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gmail.com
> <v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch><v35-0002-Add-PageGetPruneXid-helper.patch><v35-0003-Rename-PruneState-all_visible-all_frozen.patch><v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch><v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch><v35-0006-Fix-visibility-map-corruption-in-more-cases.patch><v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch><v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch><v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch><v35-0014-Track-which-relations-are-modified-by-a-query.patch><v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch><v35-0016-Pass-down-information-on-table-modification-to-s.patch><v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch><v35-0018-Set-pd_prune_xid-on-insert.patch>
1 - 0001
```
+prune_freeze_plan(PruneState *prstate, OffsetNumber *off_loc)
{
- Page page = BufferGetPage(buffer);
- BlockNumber blockno = BufferGetBlockNumber(buffer);
- OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ Page page = prstate->page;
+ BlockNumber blockno = prstate->block;
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
```
As there is a local “page”, maybe just use the local one for
PageGetMaxOffsetNumber.
0002 looks good.
2 - 0003 - Does it make sense to also do the same renaming in PruneFreezeResult?
3 - 0004
```
-
- /*
- * Calculate what the snapshot conflict horizon should be for a
record
- * freezing tuples. We can use the visibility_cutoff_xid as our
cutoff
- * for conflicts when the whole page is eligible to become
all-frozen
- * in the VM once we're done with it. Otherwise, we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (prstate->set_all_frozen)
- prstate->frz_conflict_horizon =
prstate->visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in
use */
- prstate->frz_conflict_horizon =
prstate->cutoffs->OldestXmin;
- TransactionIdRetreat(prstate->frz_conflict_horizon);
- }
+
Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
+
prstate->cutoffs->OldestXmin));
```
At this point of Assert, can prstate->pagefrz.FreezePageConflictXid be
InvalidTransactionId? My understanding is no, in that case, would it make sense
to also Assert(prstate->pagefrz.FreezePageConflictXid != InvalidTransactionId)?
Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be
InvalidTransactionId, then the Assert should be changed to something like:
Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
prstate->cutoffs->OldestXmin)
I will continue with 0005 tomorrow.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/