On Thu, Aug 31, 2023 at 2:03 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> I have a few suggestions:
>
> - Rather than removing the rather large comment block at the top of
> lazy_scan_prune() I'd like to see it rewritten to explain how we now
> deal with the problem. I'd suggest leaving the first paragraph ("Prior
> to...") just as it is and replace all the words following "The
> approach we take now is" with a description of the approach that this
> patch takes to the problem.

Good idea. I've updated the comment. I also explain why this new
approach works in the commit message and reference the commit which
added the previous approach.

> - I'm not a huge fan of the caller of heap_page_prune() having to know
> how to initialize the PruneResult. Can heap_page_prune() itself do
> that work, so that presult is an out parameter rather than an in-out
> parameter? Or, if not, can it be moved to a new helper function, like
> heap_page_prune_init(), rather than having that logic in 2+ places?

Ah, yes. Now that it has two callers, and since it is exclusively an
output parameter, it is quite ugly to initialize it in both callers.
Fixed in the attached.

> - int ndeleted,
> - nnewlpdead;
> -
> - ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
> -    limited_ts, &nnewlpdead, NULL);
> + int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
> +    limited_ts, &presult, NULL);
>
> - I don't particularly like merging the declaration with the
> assignment unless the call is narrow enough that we don't need a line
> break in there, which is not the case here.

I have changed this.

> - I haven't thoroughly investigated yet, but I wonder if there are any
> other places where comments need updating. As a general note, I find
> it desirable for a function's header comment to mention whether any
> pointer parameters are in parameters, out parameters, or in-out
> parameters, and what the contract between caller and callee actually
> is.

I've investigated vacuumlazy.c and pruneheap.c and looked at the
commit that added the retry loop (8523492d4e349) to see everywhere it
added comments and don't see anywhere else that needs updating.

I have updated lazy_scan_prune()'s function header comment to describe
the nature of the in-out and output parameters and the contract.

- Melanie
From 6ea7a2aa5698e08ce67d47efd3dbfb54be30d9cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 3 Aug 2023 15:08:58 -0400
Subject: [PATCH v2 1/2] Rebrand LVPagePruneState as PruneResult

With this rename, and relocation to heapam.h, we can use PruneResult as
an output parameter for on-access pruning as well. It also makes it
harder to confuse with PruneState.

We'll be adding and removing PruneResult fields in future commits, but
this rename and relocation is separate to make the diff easier to
understand.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 122 ++++++++++++---------------
 src/include/access/heapam.h          |  19 +++++
 src/tools/pgindent/typedefs.list     |   2 +-
 3 files changed, 76 insertions(+), 67 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a41ee635d..5e0a7422bb 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,24 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		hastup;			/* Page prevents rel truncation? */
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +232,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate);
+							PruneResult *presult);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *hastup, bool *recordfreespace);
@@ -854,7 +836,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
+		PruneResult presult;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -1019,12 +1001,12 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page, &presult);
 
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
+		Assert(!presult.all_visible || !presult.has_lpdead_items);
 
 		/* Remember the location of the last page with nonremovable tuples */
-		if (prunestate.hastup)
+		if (presult.hastup)
 			vacrel->nonempty_pages = blkno + 1;
 
 		if (vacrel->nindexes == 0)
@@ -1037,7 +1019,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * performed here can be thought of as the one-pass equivalent of
 			 * a call to lazy_vacuum().
 			 */
-			if (prunestate.has_lpdead_items)
+			if (presult.has_lpdead_items)
 			{
 				Size		freespace;
 
@@ -1064,8 +1046,8 @@ lazy_scan_heap(LVRelState *vacrel)
 				 *
 				 * Our call to lazy_vacuum_heap_page() will have considered if
 				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
+				 * of lazy_scan_prune().  Note that prune result was
+				 * invalidated by lazy_vacuum_heap_page() call.
 				 */
 				freespace = PageGetHeapFreeSpace(page);
 
@@ -1078,23 +1060,23 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * There was no call to lazy_vacuum_heap_page() because pruning
 			 * didn't encounter/create any LP_DEAD items that needed to be
 			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
+			 * with prune result-driven visibility map and FSM steps (just
+			 * like the two-pass strategy).
 			 */
 			Assert(dead_items->num_items == 0);
 		}
 
 		/*
 		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
+		 * (as of last lazy_scan_skip() call), and from presult
 		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
+		if (!all_visible_according_to_vm && presult.all_visible)
 		{
 			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-			if (prunestate.all_frozen)
+			if (presult.all_frozen)
 			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
+				Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
 				flags |= VISIBILITYMAP_ALL_FROZEN;
 			}
 
@@ -1114,7 +1096,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			PageSetAllVisible(page);
 			MarkBufferDirty(buf);
 			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
+							  vmbuffer, presult.visibility_cutoff_xid,
 							  flags);
 		}
 
@@ -1148,7 +1130,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
 		 * set, however.
 		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
+		else if (presult.has_lpdead_items && PageIsAllVisible(page))
 		{
 			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 				 vacrel->relname, blkno);
@@ -1161,16 +1143,16 @@ lazy_scan_heap(LVRelState *vacrel)
 		/*
 		 * If the all-visible page is all-frozen but not marked as such yet,
 		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
+		 * all_visible is true, so we must check both presult fields.
 		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
+		else if (all_visible_according_to_vm && presult.all_visible &&
+				 presult.all_frozen &&
 				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 		{
 			/*
 			 * Avoid relying on all_visible_according_to_vm as a proxy for the
 			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
+			 * become stale -- even when all_visible is set in presult
 			 */
 			if (!PageIsAllVisible(page))
 			{
@@ -1185,7 +1167,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * since a snapshotConflictHorizon sufficient to make everything
 			 * safe for REDO was logged when the page's tuples were frozen.
 			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
 			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 							  vmbuffer, InvalidTransactionId,
 							  VISIBILITYMAP_ALL_VISIBLE |
@@ -1196,7 +1178,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * Final steps for block: drop cleanup lock, record free space in the
 		 * FSM
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
+		if (presult.has_lpdead_items && vacrel->do_index_vacuuming)
 		{
 			/*
 			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
@@ -1514,8 +1496,16 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
+ * Prune and freeze a single page.
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
+ * vacrel is an in-out parameter lazy_scan_prune() references for relation-wide
+ * state and updates with information needed by lazy_scan_heap() to reap dead
+ * tuples, update statistics, and move the relfrozenxid forward.
+ *
+ * presult is an output parameter initialized and updated exclusively by
+ * lazy_scan_prune().
+ *
  * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
  * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
  * whether or not a tuple should be considered DEAD.  This happened when an
@@ -1536,7 +1526,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
-				LVPagePruneState *prunestate)
+				PruneResult *presult)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1595,11 +1585,11 @@ retry:
 	 * Now scan the page to collect LP_DEAD items and check for tuples
 	 * requiring freezing among remaining tuples with storage
 	 */
-	prunestate->hastup = false;
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	presult->hastup = false;
+	presult->has_lpdead_items = false;
+	presult->all_visible = true;
+	presult->all_frozen = true;
+	presult->visibility_cutoff_xid = InvalidTransactionId;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1621,7 +1611,7 @@ retry:
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* page makes rel truncation unsafe */
-			prunestate->hastup = true;
+			presult->hastup = true;
 			continue;
 		}
 
@@ -1701,13 +1691,13 @@ retry:
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (presult->all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
-						prunestate->all_visible = false;
+						presult->all_visible = false;
 						break;
 					}
 
@@ -1719,14 +1709,14 @@ retry:
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						presult->all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						presult->visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1737,7 +1727,7 @@ retry:
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				presult->all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1748,11 +1738,11 @@ retry:
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				presult->all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				presult->all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1766,7 +1756,7 @@ retry:
 				break;
 		}
 
-		prunestate->hastup = true;	/* page makes rel truncation unsafe */
+		presult->hastup = true; /* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
 		if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
@@ -1782,7 +1772,7 @@ retry:
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			presult->all_frozen = false;
 	}
 
 	/*
@@ -1800,7 +1790,7 @@ retry:
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(presult->all_visible && presult->all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1838,11 +1828,11 @@ retry:
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (presult->all_visible && presult->all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = presult->visibility_cutoff_xid;
+				presult->visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1865,7 +1855,7 @@ retry:
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		presult->all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1878,7 +1868,7 @@ retry:
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (presult->all_visible && lpdead_items == 0)
 	{
 		TransactionId cutoff;
 		bool		all_frozen;
@@ -1887,7 +1877,7 @@ retry:
 			Assert(false);
 
 		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+			   cutoff == presult->visibility_cutoff_xid);
 	}
 #endif
 
@@ -1900,7 +1890,7 @@ retry:
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
+		presult->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1925,7 +1915,7 @@ retry:
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		presult->all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf5026519..9a07828e5a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,25 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+/*
+ * State returned from pruning
+ */
+typedef struct PruneResult
+{
+	bool		hastup;			/* Page prevents rel truncation? */
+	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
+
+	/*
+	 * State describes the proper VM bit states to set for the page following
+	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
+	 * trust all_frozen result unless all_visible is also set to true.
+	 */
+	bool		all_visible;	/* Every item visible to all? */
+	bool		all_frozen;		/* provided all_visible is also true */
+	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
+} PruneResult;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..a7a8d2acd1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1394,7 +1394,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
@@ -2152,6 +2151,7 @@ ProjectionPath
 PromptInterruptContext
 ProtocolVersion
 PrsStorage
+PruneResult
 PruneState
 PruneStepResult
 PsqlScanCallbacks
-- 
2.37.2

From ee66907ed8a3af73ec974347dadab433d1a0a80f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 31 Aug 2023 18:15:35 -0400
Subject: [PATCH v2 2/2] Reuse heap_page_prune() tuple visibility statuses

heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.

This replaces the retry mechanism introduced in 8523492d4 to handle cases in
which a tuple's inserting transaction aborted between the visibility check in
heap_page_prune() and lazy_scan_prune()'s call to HeapTupleSatisfiesVacuum() --
rendering it dead but without a dead line pointer. We can instead reuse the
tuple's original visibility status, circumventing any disagreements.

We pass the HTSV array from heap_page_prune() back to lazy_scan_prune() in
PruneResult. It is convenient to move heap_page_prune()'s other output
parameters into the PruneResult.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 66 +++++++++++++++-------------
 src/backend/access/heap/vacuumlazy.c | 54 +++++++----------------
 src/include/access/heapam.h          | 14 +++++-
 3 files changed, 64 insertions(+), 70 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 47b9e20915..512f62e27d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -65,16 +65,6 @@ typedef struct
 	 * 1. Otherwise every access would need to subtract 1.
 	 */
 	bool		marked[MaxHeapTuplesPerPage + 1];
-
-	/*
-	 * Tuple visibility is only computed once for each tuple, for correctness
-	 * and efficiency reasons; see comment in heap_page_prune() for details.
-	 * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
-	 * indicate no visibility has been computed, e.g. for LP_DEAD items.
-	 *
-	 * Same indexing as ->marked.
-	 */
-	int8		htsv[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
 /* Local functions */
@@ -83,7 +73,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 											   Buffer buffer);
 static int	heap_prune_chain(Buffer buffer,
 							 OffsetNumber rootoffnum,
-							 PruneState *prstate);
+							 PruneState *prstate, PruneResult *presult);
 static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -191,6 +181,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
+		PruneResult presult;
+
 		/* OK, try to get exclusive buffer lock */
 		if (!ConditionalLockBufferForCleanup(buffer))
 			return;
@@ -202,11 +194,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		 */
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
-			int			ndeleted,
-						nnewlpdead;
+			int			ndeleted;
 
 			ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
-									   limited_ts, &nnewlpdead, NULL);
+									   limited_ts, &presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -222,9 +213,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * tracks ndeleted, since it will set the same LP_DEAD items to
 			 * LP_UNUSED separately.
 			 */
-			if (ndeleted > nnewlpdead)
+			if (ndeleted > presult.nnewlpdead)
 				pgstat_update_heap_dead_tuples(relation,
-											   ndeleted - nnewlpdead);
+											   ndeleted - presult.nnewlpdead);
 		}
 
 		/* And release buffer lock */
@@ -253,12 +244,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * either have been set by TransactionIdLimitedForOldSnapshots, or
  * InvalidTransactionId/0 respectively.
  *
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
- *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
+ * presult contains output parameters relevant to the caller.
+ * For example, sets presult->nnewlpdead for caller, indicating the number of
+ * items that were newly set LP_DEAD during prune operation.
+ *
  * Returns the number of tuples deleted from the page during this call.
  */
 int
@@ -266,7 +258,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
 				TransactionId old_snap_xmin,
 				TimestampTz old_snap_ts,
-				int *nnewlpdead,
+				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
 	int			ndeleted = 0;
@@ -301,6 +293,19 @@ heap_page_prune(Relation relation, Buffer buffer,
 	maxoff = PageGetMaxOffsetNumber(page);
 	tup.t_tableOid = RelationGetRelid(prstate.rel);
 
+	/* Initialize PruneResult */
+	presult->hastup = false;
+	presult->has_lpdead_items = false;
+	presult->all_visible = true;
+	presult->all_frozen = true;
+	presult->visibility_cutoff_xid = InvalidTransactionId;
+
+	/*
+	 * nnewlpdead only includes those items which were newly set to LP_DEAD
+	 * during pruning.
+	 */
+	presult->nnewlpdead = 0;
+
 	/*
 	 * Determine HTSV for all tuples.
 	 *
@@ -331,7 +336,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsNormal(itemid))
 		{
-			prstate.htsv[offnum] = -1;
+			presult->htsv[offnum] = -1;
 			continue;
 		}
 
@@ -347,8 +352,8 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
-														   buffer);
+		presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+															buffer);
 	}
 
 	/* Scan the page */
@@ -372,7 +377,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 			continue;
 
 		/* Process this item or chain of items */
-		ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+		ndeleted += heap_prune_chain(buffer, offnum, &prstate, presult);
 	}
 
 	/* Clear the offset information once we have processed the given page. */
@@ -473,7 +478,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	END_CRIT_SECTION();
 
 	/* Record number of newly-set-LP_DEAD items for caller */
-	*nnewlpdead = prstate.ndead;
+	presult->nnewlpdead = prstate.ndead;
 
 	return ndeleted;
 }
@@ -588,7 +593,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
  * Returns the number of tuples (to be) deleted from the page.
  */
 static int
-heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+				 PruneState *prstate, PruneResult *presult)
 {
 	int			ndeleted = 0;
 	Page		dp = (Page) BufferGetPage(buffer);
@@ -609,7 +615,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
 	 */
 	if (ItemIdIsNormal(rootlp))
 	{
-		Assert(prstate->htsv[rootoffnum] != -1);
+		Assert(presult->htsv[rootoffnum] != -1);
 		htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
 
 		if (HeapTupleHeaderIsHeapOnly(htup))
@@ -632,7 +638,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
 			 * either here or while following a chain below.  Whichever path
 			 * gets there first will mark the tuple unused.
 			 */
-			if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+			if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
 				!HeapTupleHeaderIsHotUpdated(htup))
 			{
 				heap_prune_record_unused(prstate, rootoffnum);
@@ -700,7 +706,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
 			break;
 
 		Assert(ItemIdIsNormal(lp));
-		Assert(prstate->htsv[offnum] != -1);
+		Assert(presult->htsv[offnum] != -1);
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
 
 		/*
@@ -720,7 +726,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
 		 */
 		tupdead = recent_dead = false;
 
-		switch ((HTSV_Result) prstate->htsv[offnum])
+		switch ((HTSV_Result) presult->htsv[offnum])
 		{
 			case HEAPTUPLE_DEAD:
 				tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5e0a7422bb..25ed500b45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1504,7 +1504,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
  * tuples, update statistics, and move the relfrozenxid forward.
  *
  * presult is an output parameter initialized and updated exclusively by
- * lazy_scan_prune().
+ * lazy_scan_prune() and heap_page_prune().
  *
  * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
  * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
@@ -1514,12 +1514,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
  * of complexity just so we could deal with tuples that were DEAD to VACUUM,
  * but nevertheless were left with storage after pruning.
  *
- * The approach we take now is to restart pruning when the race condition is
- * detected.  This allows heap_page_prune() to prune the tuples inserted by
- * the now-aborted transaction.  This is a little crude, but it guarantees
- * that any items that make it into the dead_items array are simple LP_DEAD
- * line pointers, and that every remaining item with tuple storage is
- * considered as a candidate for freezing.
+ * As of Postgres 17, we circumvent this problem altogether by resusing the
+ * result of heap_page_prune()'s visibility check. Without the second call to
+ * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
+ * disagreement. The tuple's TID won't be added to the array of dead tuple TIDs
+ * for vacuum, thus vacuum will still never be tasked with reaping a tuple with
+ * storage.
  */
 static void
 lazy_scan_prune(LVRelState *vacrel,
@@ -1532,14 +1532,11 @@ lazy_scan_prune(LVRelState *vacrel,
 	OffsetNumber offnum,
 				maxoff;
 	ItemId		itemid;
-	HeapTupleData tuple;
-	HTSV_Result res;
 	int			tuples_deleted,
 				tuples_frozen,
 				lpdead_items,
 				live_tuples,
 				recently_dead_tuples;
-	int			nnewlpdead;
 	HeapPageFreeze pagefrz;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1554,8 +1551,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 	maxoff = PageGetMaxOffsetNumber(page);
 
-retry:
-
 	/* Initialize (or reset) page-level state */
 	pagefrz.freeze_required = false;
 	pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1578,23 +1573,19 @@ retry:
 	 * that were deleted from indexes.
 	 */
 	tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
-									 InvalidTransactionId, 0, &nnewlpdead,
+									 InvalidTransactionId, 0,
+									 presult,
 									 &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
 	 * requiring freezing among remaining tuples with storage
 	 */
-	presult->hastup = false;
-	presult->has_lpdead_items = false;
-	presult->all_visible = true;
-	presult->all_frozen = true;
-	presult->visibility_cutoff_xid = InvalidTransactionId;
-
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
 		 offnum = OffsetNumberNext(offnum))
 	{
+		HeapTupleHeader htup;
 		bool		totally_frozen;
 
 		/*
@@ -1637,22 +1628,7 @@ retry:
 
 		Assert(ItemIdIsNormal(itemid));
 
-		ItemPointerSet(&(tuple.t_self), blkno, offnum);
-		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
-		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(rel);
-
-		/*
-		 * DEAD tuples are almost always pruned into LP_DEAD line pointers by
-		 * heap_page_prune(), but it's possible that the tuple state changed
-		 * since heap_page_prune() looked.  Handle that here by restarting.
-		 * (See comments at the top of function for a full explanation.)
-		 */
-		res = HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
-									   buf);
-
-		if (unlikely(res == HEAPTUPLE_DEAD))
-			goto retry;
+		htup = (HeapTupleHeader) PageGetItem(page, itemid);
 
 		/*
 		 * The criteria for counting a tuple as live in this block need to
@@ -1673,7 +1649,7 @@ retry:
 		 * (Cases where we bypass index vacuuming will violate this optimistic
 		 * assumption, but the overall impact of that should be negligible.)
 		 */
-		switch (res)
+		switch (presult->htsv[offnum])
 		{
 			case HEAPTUPLE_LIVE:
 
@@ -1695,7 +1671,7 @@ retry:
 				{
 					TransactionId xmin;
 
-					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+					if (!HeapTupleHeaderXminCommitted(htup))
 					{
 						presult->all_visible = false;
 						break;
@@ -1705,7 +1681,7 @@ retry:
 					 * The inserter definitely committed. But is it old enough
 					 * that everyone sees it as committed?
 					 */
-					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+					xmin = HeapTupleHeaderGetXmin(htup);
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
@@ -1759,7 +1735,7 @@ retry:
 		presult->hastup = true; /* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
-		if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
+		if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
 									  &frozen[tuples_frozen], &totally_frozen))
 		{
 			/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9a07828e5a..bf94bda634 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -199,6 +199,8 @@ typedef struct PruneResult
 	bool		hastup;			/* Page prevents rel truncation? */
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 
+	int			nnewlpdead;		/* Number of newly LP_DEAD items */
+
 	/*
 	 * State describes the proper VM bit states to set for the page following
 	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
@@ -207,6 +209,16 @@ typedef struct PruneResult
 	bool		all_visible;	/* Every item visible to all? */
 	bool		all_frozen;		/* provided all_visible is also true */
 	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
+
+	/*
+	 * Tuple visibility is only computed once for each tuple, for correctness
+	 * and efficiency reasons; see comment in heap_page_prune() for details.
+	 * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+	 * indicate no visibility has been computed, e.g. for LP_DEAD items.
+	 *
+	 * Same indexing as ->marked.
+	 */
+	int8		htsv[MaxHeapTuplesPerPage + 1];
 } PruneResult;
 
 
@@ -307,7 +319,7 @@ extern int	heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
 							TransactionId old_snap_xmin,
 							TimestampTz old_snap_ts,
-							int *nnewlpdead,
+							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
 									OffsetNumber *redirected, int nredirected,
-- 
2.37.2

Reply via email to