On Fri, Feb 21, 2025 at 5:00 PM Melanie Plageman
<melanieplage...@gmail.com> wrote:
>
> I thought about doing this, but in the end I decided against this
> approach. If we wanted to make it easy to palloc arrays of different
> sizes and have tbm_extract_page_tuple extract that many tuples into
> the array, we'd have to change more of the code anyway because
> tbm_extract_page_tuple is what tells us how many tuples there are to
> begin with. We could save this somewhere while filling in the
> PagetableEntries initially, but that is a bigger change.
>
> And, passing a length would also mean more callers would have to know
> about TBM_MAX_TUPLES_PER_PAGE, which I think is already kind of a hack
> since it is defined as MaxHeapTuplesPerPage.

I changed my mind. I think since the struct I added was only used for
tbm_extract_page_tuple(), it was a bit weird. I also think it is okay
for callers to use TBM_MAX_TUPLES_PER_PAGE. I ended up revising this
to use the same API you implemented for TIDStore in
TidStoreGetBlockOffsets(). The caller passes an array and the size of
the array and tbm_extract_page_tuple() fills in that many offsets and
returns the total number of offsets. It avoids adding a new struct and
means callers could pass a different value than
TBM_MAX_TUPLES_PER_PAGE. Overall, I like it.

- Melanie
From 081fee44b0a3e39f54d35b1f67b3c96fd9a1c553 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 20 Feb 2025 12:30:26 -0500
Subject: [PATCH v33 3/5] Separate TBM[Shared|Private]Iterator and
 TBMIterateResult

Remove the TBMIterateResult member from the TBMPrivateIterator and
TBMSharedIterator and make tbm_[shared|private_]iterate() take a
TBMIterateResult as a parameter.

This allows tidbitmap API users to manage multiple TBMIterateResults per
scan. This is required for bitmap heap scan to use the read stream API,
with which there may be multiple I/Os in flight at once, each one with a
TBMIterateResult.

Reviewed-by: Tomas Vondra <to...@vondra.me>
Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
---
 src/backend/access/gin/ginget.c           | 33 ++++----
 src/backend/access/gin/ginscan.c          |  3 +-
 src/backend/access/heap/heapam_handler.c  | 28 +++----
 src/backend/executor/nodeBitmapHeapscan.c | 39 +++++----
 src/backend/nodes/tidbitmap.c             | 96 ++++++++++++-----------
 src/include/access/gin_private.h          |  7 +-
 src/include/nodes/tidbitmap.h             |  6 +-
 7 files changed, 111 insertions(+), 101 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index f3b19d280c3..791bc70ec26 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -332,8 +332,8 @@ restartScanEntry:
 	entry->list = NULL;
 	entry->nlist = 0;
 	entry->matchBitmap = NULL;
-	entry->matchResult = NULL;
 	entry->matchNtuples = -1;
+	entry->matchResult.blockno = InvalidBlockNumber;
 	entry->reduceResult = false;
 	entry->predictNumberResult = 0;
 
@@ -825,20 +825,19 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 		{
 			/*
 			 * If we've exhausted all items on this block, move to next block
-			 * in the bitmap.
+			 * in the bitmap. tbm_private_iterate() sets matchResult.blockno
+			 * to InvalidBlockNumber when the bitmap is exhausted.
 			 */
-			while (entry->matchResult == NULL ||
-				   (!entry->matchResult->lossy &&
+			while ((!BlockNumberIsValid(entry->matchResult.blockno)) ||
+				   (!entry->matchResult.lossy &&
 					entry->offset >= entry->matchNtuples) ||
-				   entry->matchResult->blockno < advancePastBlk ||
+				   entry->matchResult.blockno < advancePastBlk ||
 				   (ItemPointerIsLossyPage(&advancePast) &&
-					entry->matchResult->blockno == advancePastBlk))
+					entry->matchResult.blockno == advancePastBlk))
 			{
-				entry->matchResult =
-					tbm_private_iterate(entry->matchIterator);
-
-				if (entry->matchResult == NULL)
+				if (!tbm_private_iterate(entry->matchIterator, &entry->matchResult))
 				{
+					Assert(!BlockNumberIsValid(entry->matchResult.blockno));
 					ItemPointerSetInvalid(&entry->curItem);
 					tbm_end_private_iterate(entry->matchIterator);
 					entry->matchIterator = NULL;
@@ -847,14 +846,14 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 				}
 
 				/* Exact pages need their tuple offsets extracted. */
-				if (!entry->matchResult->lossy)
-					entry->matchNtuples = tbm_extract_page_tuple(entry->matchResult,
+				if (!entry->matchResult.lossy)
+					entry->matchNtuples = tbm_extract_page_tuple(&entry->matchResult,
 																 entry->matchOffsets,
 																 TBM_MAX_TUPLES_PER_PAGE);
 
 				/*
 				 * Reset counter to the beginning of entry->matchResult. Note:
-				 * entry->offset is still greater than entry->matchNtuples if
+				 * entry->offset is still greater than matchResult.ntuples if
 				 * matchResult is lossy.  So, on next call we will get next
 				 * result from TIDBitmap.
 				 */
@@ -867,10 +866,10 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 * We're now on the first page after advancePast which has any
 			 * items on it. If it's a lossy result, return that.
 			 */
-			if (entry->matchResult->lossy)
+			if (entry->matchResult.lossy)
 			{
 				ItemPointerSetLossyPage(&entry->curItem,
-										entry->matchResult->blockno);
+										entry->matchResult.blockno);
 
 				/*
 				 * We might as well fall out of the loop; we could not
@@ -887,7 +886,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			Assert(entry->matchNtuples > -1);
 
 			/* Skip over any offsets <= advancePast, and return that. */
-			if (entry->matchResult->blockno == advancePastBlk)
+			if (entry->matchResult.blockno == advancePastBlk)
 			{
 				Assert(entry->matchNtuples > 0);
 
@@ -908,7 +907,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			}
 
 			ItemPointerSet(&entry->curItem,
-						   entry->matchResult->blockno,
+						   entry->matchResult.blockno,
 						   entry->matchOffsets[entry->offset]);
 			entry->offset++;
 
diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c
index 7d1e6615260..f033417e5a8 100644
--- a/src/backend/access/gin/ginscan.c
+++ b/src/backend/access/gin/ginscan.c
@@ -106,7 +106,8 @@ ginFillScanEntry(GinScanOpaque so, OffsetNumber attnum,
 	ItemPointerSetMin(&scanEntry->curItem);
 	scanEntry->matchBitmap = NULL;
 	scanEntry->matchIterator = NULL;
-	scanEntry->matchResult = NULL;
+	scanEntry->matchResult.blockno = InvalidBlockNumber;
+	scanEntry->matchNtuples = -1;
 	scanEntry->list = NULL;
 	scanEntry->nlist = 0;
 	scanEntry->offset = InvalidOffsetNumber;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e78682c3cef..ff1d8085883 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2126,7 +2126,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Buffer		buffer;
 	Snapshot	snapshot;
 	int			ntup;
-	TBMIterateResult *tbmres;
+	TBMIterateResult tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
 
@@ -2142,14 +2142,12 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	{
 		CHECK_FOR_INTERRUPTS();
 
-		tbmres = tbm_iterate(&scan->st.rs_tbmiterator);
-
-		if (tbmres == NULL)
+		if (!tbm_iterate(&scan->st.rs_tbmiterator, &tbmres))
 			return false;
 
 		/* Exact pages need their tuple offsets extracted. */
-		if (!tbmres->lossy)
-			noffsets = tbm_extract_page_tuple(tbmres, offsets,
+		if (!tbmres.lossy)
+			noffsets = tbm_extract_page_tuple(&tbmres, offsets,
 											  TBM_MAX_TUPLES_PER_PAGE);
 
 		/*
@@ -2161,11 +2159,11 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		 * reachable by the index.
 		 */
 	} while (!IsolationIsSerializable() &&
-			 tbmres->blockno >= hscan->rs_nblocks);
+			 tbmres.blockno >= hscan->rs_nblocks);
 
 	/* Got a valid block */
-	*blockno = tbmres->blockno;
-	*recheck = tbmres->recheck;
+	*blockno = tbmres.blockno;
+	*recheck = tbmres.recheck;
 
 	/*
 	 * We can skip fetching the heap page if we don't need any fields from the
@@ -2173,11 +2171,11 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	 * page are visible to our transaction.
 	 */
 	if (!(scan->rs_flags & SO_NEED_TUPLES) &&
-		!tbmres->recheck &&
-		VM_ALL_VISIBLE(scan->rs_rd, tbmres->blockno, &bscan->rs_vmbuffer))
+		!tbmres.recheck &&
+		VM_ALL_VISIBLE(scan->rs_rd, tbmres.blockno, &bscan->rs_vmbuffer))
 	{
 		/* can't be lossy in the skip_fetch case */
-		Assert(!tbmres->lossy);
+		Assert(!tbmres.lossy);
 		Assert(bscan->rs_empty_tuples_pending >= 0);
 		Assert(noffsets > -1);
 
@@ -2186,7 +2184,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		return true;
 	}
 
-	block = tbmres->blockno;
+	block = tbmres.blockno;
 
 	/*
 	 * Acquire pin on the target heap page, trading in any pin we held before.
@@ -2215,7 +2213,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * We need two separate strategies for lossy and non-lossy cases.
 	 */
-	if (!tbmres->lossy)
+	if (!tbmres.lossy)
 	{
 		/*
 		 * Bitmap is non-lossy, so we just look through the offsets listed in
@@ -2279,7 +2277,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Assert(ntup <= MaxHeapTuplesPerPage);
 	hscan->rs_ntuples = ntup;
 
-	if (tbmres->lossy)
+	if (tbmres.lossy)
 		(*lossy_pages)++;
 	else
 		(*exact_pages)++;
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index be0d24d901b..3b4ea0f6144 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -317,7 +317,7 @@ BitmapAdjustPrefetchIterator(BitmapHeapScanState *node)
 {
 #ifdef USE_PREFETCH
 	ParallelBitmapHeapState *pstate = node->pstate;
-	TBMIterateResult *tbmpre;
+	TBMIterateResult tbmpre;
 
 	if (pstate == NULL)
 	{
@@ -330,9 +330,8 @@ BitmapAdjustPrefetchIterator(BitmapHeapScanState *node)
 		}
 		else if (!tbm_exhausted(prefetch_iterator))
 		{
-			tbmpre = tbm_iterate(prefetch_iterator);
-			node->prefetch_blockno = tbmpre ? tbmpre->blockno :
-				InvalidBlockNumber;
+			tbm_iterate(prefetch_iterator, &tbmpre);
+			node->prefetch_blockno = tbmpre.blockno;
 		}
 		return;
 	}
@@ -371,9 +370,8 @@ BitmapAdjustPrefetchIterator(BitmapHeapScanState *node)
 			 */
 			if (!tbm_exhausted(prefetch_iterator))
 			{
-				tbmpre = tbm_iterate(prefetch_iterator);
-				node->prefetch_blockno = tbmpre ? tbmpre->blockno :
-					InvalidBlockNumber;
+				tbm_iterate(prefetch_iterator, &tbmpre);
+				node->prefetch_blockno = tbmpre.blockno;
 			}
 		}
 	}
@@ -441,17 +439,18 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 		{
 			while (node->prefetch_pages < node->prefetch_target)
 			{
-				TBMIterateResult *tbmpre = tbm_iterate(prefetch_iterator);
+				TBMIterateResult tbmpre;
 				bool		skip_fetch;
 
-				if (tbmpre == NULL)
+				if (!tbm_iterate(prefetch_iterator, &tbmpre))
 				{
 					/* No more pages to prefetch */
+					Assert(!BlockNumberIsValid(tbmpre.blockno));
 					tbm_end_iterate(prefetch_iterator);
 					break;
 				}
 				node->prefetch_pages++;
-				node->prefetch_blockno = tbmpre->blockno;
+				node->prefetch_blockno = tbmpre.blockno;
 
 				/*
 				 * If we expect not to have to actually read this heap page,
@@ -460,13 +459,13 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 				 * prefetch_pages?)
 				 */
 				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
-							  !tbmpre->recheck &&
+							  !tbmpre.recheck &&
 							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
-											 tbmpre->blockno,
+											 tbmpre.blockno,
 											 &node->pvmbuffer));
 
 				if (!skip_fetch)
-					PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre->blockno);
+					PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre.blockno);
 			}
 		}
 
@@ -481,7 +480,7 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 		{
 			while (1)
 			{
-				TBMIterateResult *tbmpre;
+				TBMIterateResult tbmpre;
 				bool		do_prefetch = false;
 				bool		skip_fetch;
 
@@ -500,25 +499,25 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 				if (!do_prefetch)
 					return;
 
-				tbmpre = tbm_iterate(prefetch_iterator);
-				if (tbmpre == NULL)
+				if (!tbm_iterate(prefetch_iterator, &tbmpre))
 				{
+					Assert(!BlockNumberIsValid(tbmpre.blockno));
 					/* No more pages to prefetch */
 					tbm_end_iterate(prefetch_iterator);
 					break;
 				}
 
-				node->prefetch_blockno = tbmpre->blockno;
+				node->prefetch_blockno = tbmpre.blockno;
 
 				/* As above, skip prefetch if we expect not to need page */
 				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
-							  !tbmpre->recheck &&
+							  !tbmpre.recheck &&
 							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
-											 tbmpre->blockno,
+											 tbmpre.blockno,
 											 &node->pvmbuffer));
 
 				if (!skip_fetch)
-					PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre->blockno);
+					PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre.blockno);
 			}
 		}
 	}
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 3d835024caa..41031aa8f2f 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -172,7 +172,6 @@ struct TBMPrivateIterator
 	int			spageptr;		/* next spages index */
 	int			schunkptr;		/* next schunks index */
 	int			schunkbit;		/* next bit to check in current schunk */
-	TBMIterateResult output;
 };
 
 /*
@@ -213,7 +212,6 @@ struct TBMSharedIterator
 	PTEntryArray *ptbase;		/* pagetable element array */
 	PTIterationArray *ptpages;	/* sorted exact page index list */
 	PTIterationArray *ptchunks; /* sorted lossy page index list */
-	TBMIterateResult output;
 };
 
 /* Local function prototypes */
@@ -957,21 +955,28 @@ tbm_advance_schunkbit(PagetableEntry *chunk, int *schunkbitp)
 /*
  * tbm_private_iterate - scan through next page of a TIDBitmap
  *
- * Returns a TBMIterateResult representing one page, or NULL if there are
- * no more pages to scan.  Pages are guaranteed to be delivered in numerical
- * order.  If lossy is true, then the bitmap is "lossy" and failed to
- * remember the exact tuples to look at on this page --- the caller must
- * examine all tuples on the page and check if they meet the intended
- * condition.  result->ntuples is set to -1 when the bitmap is lossy.
- * If result->recheck is true, only the indicated tuples need
- * be examined, but the condition must be rechecked anyway.  (For ease of
- * testing, recheck is always set true when lossy is true.)
+ * Caller must pass in a TBMIterateResult to be filled.
+ *
+ * Pages are guaranteed to be delivered in numerical order.
+ *
+ * Returns false when there are no more pages to scan and true otherwise. When
+ * there are no more pages to scan, tbmres->blockno is set to
+ * InvalidBlockNumber.
+ *
+ * If lossy is true, then the bitmap is "lossy" and failed to remember
+ * the exact tuples to look at on this page --- the caller must examine all
+ * tuples on the page and check if they meet the intended condition. If lossy
+ * is false, the caller must later extract the tuple offsets from the page
+ * pointed to by internal_page with tbm_extract_page_tuple.
+ *
+ * If tbmres->recheck is true, only the indicated tuples need be examined, but
+ * the condition must be rechecked anyway.  (For ease of testing, recheck is
+ * always set true when lossy is true.)
  */
-TBMIterateResult *
-tbm_private_iterate(TBMPrivateIterator *iterator)
+bool
+tbm_private_iterate(TBMPrivateIterator *iterator, TBMIterateResult *tbmres)
 {
 	TIDBitmap  *tbm = iterator->tbm;
-	TBMIterateResult *output = &(iterator->output);
 
 	Assert(tbm->iterating == TBM_ITERATING_PRIVATE);
 
@@ -1009,12 +1014,12 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 			chunk_blockno < tbm->spages[iterator->spageptr]->blockno)
 		{
 			/* Return a lossy page indicator from the chunk */
-			output->blockno = chunk_blockno;
-			output->lossy = true;
-			output->recheck = true;
-			output->internal_page = NULL;
+			tbmres->blockno = chunk_blockno;
+			tbmres->lossy = true;
+			tbmres->recheck = true;
+			tbmres->internal_page = NULL;
 			iterator->schunkbit++;
-			return output;
+			return true;
 		}
 	}
 
@@ -1028,16 +1033,17 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 		else
 			page = tbm->spages[iterator->spageptr];
 
-		output->internal_page = page;
-		output->blockno = page->blockno;
-		output->lossy = false;
-		output->recheck = page->recheck;
+		tbmres->internal_page = page;
+		tbmres->blockno = page->blockno;
+		tbmres->lossy = false;
+		tbmres->recheck = page->recheck;
 		iterator->spageptr++;
-		return output;
+		return true;
 	}
 
 	/* Nothing more in the bitmap */
-	return NULL;
+	tbmres->blockno = InvalidBlockNumber;
+	return false;
 }
 
 /*
@@ -1047,10 +1053,9 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
  *	across multiple processes.  We need to acquire the iterator LWLock,
  *	before accessing the shared members.
  */
-TBMIterateResult *
-tbm_shared_iterate(TBMSharedIterator *iterator)
+bool
+tbm_shared_iterate(TBMSharedIterator *iterator, TBMIterateResult *tbmres)
 {
-	TBMIterateResult *output = &iterator->output;
 	TBMSharedIteratorState *istate = iterator->state;
 	PagetableEntry *ptbase = NULL;
 	int		   *idxpages = NULL;
@@ -1101,14 +1106,14 @@ tbm_shared_iterate(TBMSharedIterator *iterator)
 			chunk_blockno < ptbase[idxpages[istate->spageptr]].blockno)
 		{
 			/* Return a lossy page indicator from the chunk */
-			output->blockno = chunk_blockno;
-			output->lossy = true;
-			output->recheck = true;
-			output->internal_page = NULL;
+			tbmres->blockno = chunk_blockno;
+			tbmres->lossy = true;
+			tbmres->recheck = true;
+			tbmres->internal_page = NULL;
 			istate->schunkbit++;
 
 			LWLockRelease(&istate->lock);
-			return output;
+			return true;
 		}
 	}
 
@@ -1116,21 +1121,22 @@ tbm_shared_iterate(TBMSharedIterator *iterator)
 	{
 		PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
 
-		output->internal_page = page;
-		output->blockno = page->blockno;
-		output->lossy = false;
-		output->recheck = page->recheck;
+		tbmres->internal_page = page;
+		tbmres->blockno = page->blockno;
+		tbmres->lossy = false;
+		tbmres->recheck = page->recheck;
 		istate->spageptr++;
 
 		LWLockRelease(&istate->lock);
 
-		return output;
+		return true;
 	}
 
 	LWLockRelease(&istate->lock);
 
 	/* Nothing more in the bitmap */
-	return NULL;
+	tbmres->blockno = InvalidBlockNumber;
+	return false;
 }
 
 /*
@@ -1604,15 +1610,17 @@ tbm_end_iterate(TBMIterator *iterator)
 }
 
 /*
- * Get the next TBMIterateResult from the shared or private bitmap iterator.
+ * Populate the next TBMIterateResult using the shared or private bitmap
+ * iterator. Returns false when there is nothing more to scan.
  */
-TBMIterateResult *
-tbm_iterate(TBMIterator *iterator)
+bool
+tbm_iterate(TBMIterator *iterator, TBMIterateResult *tbmres)
 {
 	Assert(iterator);
+	Assert(tbmres);
 
 	if (iterator->shared)
-		return tbm_shared_iterate(iterator->i.shared_iterator);
+		return tbm_shared_iterate(iterator->i.shared_iterator, tbmres);
 	else
-		return tbm_private_iterate(iterator->i.private_iterator);
+		return tbm_private_iterate(iterator->i.private_iterator, tbmres);
 }
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 50478db9820..82d2c28583e 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -353,7 +353,12 @@ typedef struct GinScanEntryData
 	/* for a partial-match or full-scan query, we accumulate all TIDs here */
 	TIDBitmap  *matchBitmap;
 	TBMPrivateIterator *matchIterator;
-	TBMIterateResult *matchResult;
+
+	/*
+	 * If blockno is InvalidBlockNumber, all of the other fields in the
+	 * matchResult are meaningless.
+	 */
+	TBMIterateResult matchResult;
 	OffsetNumber matchOffsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			matchNtuples;
 
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index e185635c10b..99f795ceab5 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -101,8 +101,8 @@ extern bool tbm_is_empty(const TIDBitmap *tbm);
 
 extern TBMPrivateIterator *tbm_begin_private_iterate(TIDBitmap *tbm);
 extern dsa_pointer tbm_prepare_shared_iterate(TIDBitmap *tbm);
-extern TBMIterateResult *tbm_private_iterate(TBMPrivateIterator *iterator);
-extern TBMIterateResult *tbm_shared_iterate(TBMSharedIterator *iterator);
+extern bool tbm_private_iterate(TBMPrivateIterator *iterator, TBMIterateResult *tbmres);
+extern bool tbm_shared_iterate(TBMSharedIterator *iterator, TBMIterateResult *tbmres);
 extern void tbm_end_private_iterate(TBMPrivateIterator *iterator);
 extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
 extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
@@ -113,7 +113,7 @@ extern TBMIterator tbm_begin_iterate(TIDBitmap *tbm,
 									 dsa_area *dsa, dsa_pointer dsp);
 extern void tbm_end_iterate(TBMIterator *iterator);
 
-extern TBMIterateResult *tbm_iterate(TBMIterator *iterator);
+extern bool tbm_iterate(TBMIterator *iterator, TBMIterateResult *tbmres);
 
 static inline bool
 tbm_exhausted(TBMIterator *iterator)
-- 
2.34.1

From 0305ae391c883dfea7482cdeff2c9abe131c4cd0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 21 Feb 2025 15:15:47 -0500
Subject: [PATCH v33 4/5] BitmapHeapScan uses the read stream API

Make Bitmap Heap Scan use the read stream API instead of invoking
ReadBuffer() for each block indicated by the bitmap.

The read stream API handles prefetching, so remove all of the explicit
prefetching from bitmap heap scan code.

Now, heap table AM implements a read stream callback which uses the
bitmap iterator to return the next required block to the read stream
code.

Reviewed-by: Tomas Vondra <to...@vondra.me>
Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
---
 src/backend/access/heap/heapam.c          |  80 +++++
 src/backend/access/heap/heapam_handler.c  |  95 +++---
 src/backend/executor/nodeBitmapHeapscan.c | 341 +---------------------
 src/include/access/tableam.h              |  25 +-
 src/include/nodes/execnodes.h             |  23 +-
 5 files changed, 130 insertions(+), 434 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..65e44b25b74 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -279,6 +279,72 @@ heap_scan_stream_read_next_serial(ReadStream *stream,
 	return scan->rs_prefetch_block;
 }
 
+/*
+ * Read stream API callback for bitmap heap scans.
+ * Returns the next block the caller wants from the read stream or
+ * InvalidBlockNumber when done.
+ */
+static BlockNumber
+bitmapheap_stream_read_next(ReadStream *pgsr, void *private_data,
+							void *per_buffer_data)
+{
+	TBMIterateResult *tbmres = per_buffer_data;
+	BitmapHeapScanDesc bscan = (BitmapHeapScanDesc) private_data;
+	HeapScanDesc hscan = (HeapScanDesc) bscan;
+	TableScanDesc sscan = &hscan->rs_base;
+
+	for (;;)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		/* no more entries in the bitmap */
+		if (!tbm_iterate(&sscan->st.rs_tbmiterator, tbmres))
+			return InvalidBlockNumber;
+
+		/*
+		 * Ignore any claimed entries past what we think is the end of the
+		 * relation. It may have been extended after the start of our scan (we
+		 * only hold an AccessShareLock, and it could be inserts from this
+		 * backend).  We don't take this optimization in SERIALIZABLE
+		 * isolation though, as we need to examine all invisible tuples
+		 * reachable by the index.
+		 */
+		if (!IsolationIsSerializable() &&
+			tbmres->blockno >= hscan->rs_nblocks)
+			continue;
+
+		/*
+		 * We can skip fetching the heap page if we don't need any fields from
+		 * the heap, the bitmap entries don't need rechecking, and all tuples
+		 * on the page are visible to our transaction.
+		 */
+		if (!(sscan->rs_flags & SO_NEED_TUPLES) &&
+			!tbmres->recheck &&
+			VM_ALL_VISIBLE(sscan->rs_rd, tbmres->blockno, &bscan->rs_vmbuffer))
+		{
+			OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
+			int			noffsets;
+
+			/* can't be lossy in the skip_fetch case */
+			Assert(!tbmres->lossy);
+			Assert(bscan->rs_empty_tuples_pending >= 0);
+
+			/*
+			 * We throw away the offsets, but this is the easiest way to get a
+			 * count of tuples.
+			 */
+			noffsets = tbm_extract_page_tuple(tbmres, offsets, TBM_MAX_TUPLES_PER_PAGE);
+			bscan->rs_empty_tuples_pending += noffsets;
+			continue;
+		}
+
+		return tbmres->blockno;
+	}
+
+	/* not reachable */
+	Assert(false);
+}
+
 /* ----------------
  *		initscan - scan code common to heap_beginscan and heap_rescan
  * ----------------
@@ -1067,6 +1133,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	scan->rs_base.rs_flags = flags;
 	scan->rs_base.rs_parallel = parallel_scan;
 	scan->rs_strategy = NULL;	/* set in initscan */
+	scan->rs_cbuf = InvalidBuffer;
 
 	/*
 	 * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
@@ -1146,6 +1213,16 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  scan,
 														  0);
 	}
+	else if (scan->rs_base.rs_flags & SO_TYPE_BITMAPSCAN)
+	{
+		scan->rs_read_stream = read_stream_begin_relation(READ_STREAM_DEFAULT,
+														  scan->rs_strategy,
+														  scan->rs_base.rs_rd,
+														  MAIN_FORKNUM,
+														  bitmapheap_stream_read_next,
+														  scan,
+														  sizeof(TBMIterateResult));
+	}
 
 
 	return (TableScanDesc) scan;
@@ -1180,7 +1257,10 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 	 * unpin scan buffers
 	 */
 	if (BufferIsValid(scan->rs_cbuf))
+	{
 		ReleaseBuffer(scan->rs_cbuf);
+		scan->rs_cbuf = InvalidBuffer;
+	}
 
 	if (scan->rs_base.rs_flags & SO_TYPE_BITMAPSCAN)
 	{
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ff1d8085883..4734283a9a9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2117,82 +2117,79 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
 
 static bool
 heapam_scan_bitmap_next_block(TableScanDesc scan,
-							  BlockNumber *blockno, bool *recheck,
+							  bool *recheck,
 							  uint64 *lossy_pages, uint64 *exact_pages)
 {
 	BitmapHeapScanDesc bscan = (BitmapHeapScanDesc) scan;
 	HeapScanDesc hscan = (HeapScanDesc) bscan;
 	BlockNumber block;
+	void	   *per_buffer_data;
 	Buffer		buffer;
 	Snapshot	snapshot;
 	int			ntup;
-	TBMIterateResult tbmres;
+	TBMIterateResult *tbmres;
 	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
 	int			noffsets = -1;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
+	Assert(hscan->rs_read_stream);
 
 	hscan->rs_cindex = 0;
 	hscan->rs_ntuples = 0;
 
-	*blockno = InvalidBlockNumber;
-	*recheck = true;
-
-	do
+	/* Release buffer containing previous block. */
+	if (BufferIsValid(hscan->rs_cbuf))
 	{
-		CHECK_FOR_INTERRUPTS();
+		ReleaseBuffer(hscan->rs_cbuf);
+		hscan->rs_cbuf = InvalidBuffer;
+	}
 
-		if (!tbm_iterate(&scan->st.rs_tbmiterator, &tbmres))
-			return false;
+	hscan->rs_cbuf = read_stream_next_buffer(hscan->rs_read_stream,
+											 &per_buffer_data);
 
-		/* Exact pages need their tuple offsets extracted. */
-		if (!tbmres.lossy)
-			noffsets = tbm_extract_page_tuple(&tbmres, offsets,
-											  TBM_MAX_TUPLES_PER_PAGE);
+	if (BufferIsInvalid(hscan->rs_cbuf))
+	{
+		if (BufferIsValid(bscan->rs_vmbuffer))
+		{
+			ReleaseBuffer(bscan->rs_vmbuffer);
+			bscan->rs_vmbuffer = InvalidBuffer;
+		}
 
 		/*
-		 * Ignore any claimed entries past what we think is the end of the
-		 * relation. It may have been extended after the start of our scan (we
-		 * only hold an AccessShareLock, and it could be inserts from this
-		 * backend).  We don't take this optimization in SERIALIZABLE
-		 * isolation though, as we need to examine all invisible tuples
-		 * reachable by the index.
+		 * Bitmap is exhausted. Time to emit empty tuples if relevant. We emit
+		 * all empty tuples at the end instead of emitting them per block we
+		 * skip fetching. This is necessary because the streaming read API
+		 * will only return TBMIterateResults for blocks actually fetched.
+		 * When we skip fetching a block, we keep track of how many empty
+		 * tuples to emit at the end of the BitmapHeapScan. We do not recheck
+		 * all NULL tuples.
 		 */
-	} while (!IsolationIsSerializable() &&
-			 tbmres.blockno >= hscan->rs_nblocks);
+		*recheck = false;
+		return bscan->rs_empty_tuples_pending > 0;
+	}
+
+	Assert(per_buffer_data);
+
+	tbmres = per_buffer_data;
 
-	/* Got a valid block */
-	*blockno = tbmres.blockno;
-	*recheck = tbmres.recheck;
+	Assert(BlockNumberIsValid(tbmres->blockno));
+	Assert(BufferGetBlockNumber(hscan->rs_cbuf) == tbmres->blockno);
 
 	/*
-	 * We can skip fetching the heap page if we don't need any fields from the
-	 * heap, the bitmap entries don't need rechecking, and all tuples on the
-	 * page are visible to our transaction.
+	 * Exact pages need their tuple offsets extracted.
+	 * tbm_extract_page_tuple() will set offsets.ntuples to the correct
+	 * number.
 	 */
-	if (!(scan->rs_flags & SO_NEED_TUPLES) &&
-		!tbmres.recheck &&
-		VM_ALL_VISIBLE(scan->rs_rd, tbmres.blockno, &bscan->rs_vmbuffer))
-	{
-		/* can't be lossy in the skip_fetch case */
-		Assert(!tbmres.lossy);
-		Assert(bscan->rs_empty_tuples_pending >= 0);
-		Assert(noffsets > -1);
+	if (!tbmres->lossy)
+		noffsets = tbm_extract_page_tuple(tbmres, offsets,
+										  TBM_MAX_TUPLES_PER_PAGE);
 
-		bscan->rs_empty_tuples_pending += noffsets;
+	*recheck = tbmres->recheck;
 
-		return true;
-	}
-
-	block = tbmres.blockno;
+	hscan->rs_cblock = tbmres->blockno;
+	hscan->rs_ntuples = noffsets;
 
-	/*
-	 * Acquire pin on the target heap page, trading in any pin we held before.
-	 */
-	hscan->rs_cbuf = ReleaseAndReadBuffer(hscan->rs_cbuf,
-										  scan->rs_rd,
-										  block);
-	hscan->rs_cblock = block;
+	block = tbmres->blockno;
 	buffer = hscan->rs_cbuf;
 	snapshot = scan->rs_snapshot;
 
@@ -2213,7 +2210,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * We need two separate strategies for lossy and non-lossy cases.
 	 */
-	if (!tbmres.lossy)
+	if (!tbmres->lossy)
 	{
 		/*
 		 * Bitmap is non-lossy, so we just look through the offsets listed in
@@ -2277,7 +2274,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Assert(ntup <= MaxHeapTuplesPerPage);
 	hscan->rs_ntuples = ntup;
 
-	if (tbmres.lossy)
+	if (tbmres->lossy)
 		(*lossy_pages)++;
 	else
 		(*exact_pages)++;
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 3b4ea0f6144..6df34094a13 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -51,10 +51,6 @@
 static void BitmapTableScanSetup(BitmapHeapScanState *node);
 static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
 static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
-static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node);
-static inline void BitmapAdjustPrefetchTarget(BitmapHeapScanState *node);
-static inline void BitmapPrefetch(BitmapHeapScanState *node,
-								  TableScanDesc scan);
 static bool BitmapShouldInitializeSharedState(ParallelBitmapHeapState *pstate);
 
 
@@ -62,14 +58,6 @@ static bool BitmapShouldInitializeSharedState(ParallelBitmapHeapState *pstate);
  * Do the underlying index scan, build the bitmap, set up the parallel state
  * needed for parallel workers to iterate through the bitmap, and set up the
  * underlying table scan descriptor.
- *
- * For prefetching, we use *two* iterators, one for the pages we are actually
- * scanning and another that runs ahead of the first for prefetching.
- * node->prefetch_pages tracks exactly how many pages ahead the prefetch
- * iterator is.  Also, node->prefetch_target tracks the desired prefetch
- * distance, which starts small and increases up to the
- * node->prefetch_maximum.  This is to avoid doing a lot of prefetching in a
- * scan that stops after a few tuples because of a LIMIT.
  */
 static void
 BitmapTableScanSetup(BitmapHeapScanState *node)
@@ -102,14 +90,6 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 		 */
 		pstate->tbmiterator = tbm_prepare_shared_iterate(node->tbm);
 
-#ifdef USE_PREFETCH
-		if (node->prefetch_maximum > 0)
-		{
-			pstate->prefetch_iterator =
-				tbm_prepare_shared_iterate(node->tbm);
-		}
-#endif							/* USE_PREFETCH */
-
 		/* We have initialized the shared state so wake up others. */
 		BitmapDoneInitializingSharedState(pstate);
 	}
@@ -119,15 +99,6 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 									pstate->tbmiterator :
 									InvalidDsaPointer);
 
-#ifdef USE_PREFETCH
-	if (node->prefetch_maximum > 0)
-		node->prefetch_iterator =
-			tbm_begin_iterate(node->tbm, dsa,
-							  pstate ?
-							  pstate->prefetch_iterator :
-							  InvalidDsaPointer);
-#endif							/* USE_PREFETCH */
-
 	/*
 	 * If this is the first scan of the underlying table, create the table
 	 * scan descriptor and begin the scan.
@@ -158,7 +129,6 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	node->initialized = true;
 }
 
-
 /* ----------------------------------------------------------------
  *		BitmapHeapNext
  *
@@ -172,10 +142,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	TableScanDesc scan;
 	TupleTableSlot *slot;
 
-#ifdef USE_PREFETCH
-	ParallelBitmapHeapState *pstate = node->pstate;
-#endif
-
 	/*
 	 * extract necessary information from index scan node
 	 */
@@ -204,37 +170,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			CHECK_FOR_INTERRUPTS();
 
-#ifdef USE_PREFETCH
-
-			/*
-			 * Try to prefetch at least a few pages even before we get to the
-			 * second page if we don't stop reading after the first tuple.
-			 */
-			if (!pstate)
-			{
-				if (node->prefetch_target < node->prefetch_maximum)
-					node->prefetch_target++;
-			}
-			else if (pstate->prefetch_target < node->prefetch_maximum)
-			{
-				/* take spinlock while updating shared state */
-				SpinLockAcquire(&pstate->mutex);
-				if (pstate->prefetch_target < node->prefetch_maximum)
-					pstate->prefetch_target++;
-				SpinLockRelease(&pstate->mutex);
-			}
-#endif							/* USE_PREFETCH */
-
-			/*
-			 * We issue prefetch requests *after* fetching the current page to
-			 * try to avoid having prefetching interfere with the main I/O.
-			 * Also, this should happen only when we have determined there is
-			 * still something to do on the current page, else we may
-			 * uselessly prefetch the same page we are just about to request
-			 * for real.
-			 */
-			BitmapPrefetch(node, scan);
-
 			/*
 			 * If we are using lossy info, we have to recheck the qual
 			 * conditions at every tuple.
@@ -257,31 +192,15 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 new_page:
 
-		BitmapAdjustPrefetchIterator(node);
-
 		/*
 		 * Returns false if the bitmap is exhausted and there are no further
 		 * blocks we need to scan.
 		 */
-		if (!table_scan_bitmap_next_block(scan, &node->blockno,
+		if (!table_scan_bitmap_next_block(scan,
 										  &node->recheck,
 										  &node->stats.lossy_pages,
 										  &node->stats.exact_pages))
 			break;
-
-		/*
-		 * If serial, we can error out if the prefetch block doesn't stay
-		 * ahead of the current block.
-		 */
-		if (node->pstate == NULL &&
-			!tbm_exhausted(&node->prefetch_iterator) &&
-			node->prefetch_blockno < node->blockno)
-			elog(ERROR,
-				 "prefetch and main iterators are out of sync. pfblockno: %d. blockno: %d",
-				 node->prefetch_blockno, node->blockno);
-
-		/* Adjust the prefetch target */
-		BitmapAdjustPrefetchTarget(node);
 	}
 
 	/*
@@ -305,225 +224,6 @@ BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate)
 	ConditionVariableBroadcast(&pstate->cv);
 }
 
-/*
- *	BitmapAdjustPrefetchIterator - Adjust the prefetch iterator
- *
- *	We keep track of how far the prefetch iterator is ahead of the main
- *	iterator in prefetch_pages. For each block the main iterator returns, we
- *	decrement prefetch_pages.
- */
-static inline void
-BitmapAdjustPrefetchIterator(BitmapHeapScanState *node)
-{
-#ifdef USE_PREFETCH
-	ParallelBitmapHeapState *pstate = node->pstate;
-	TBMIterateResult tbmpre;
-
-	if (pstate == NULL)
-	{
-		TBMIterator *prefetch_iterator = &node->prefetch_iterator;
-
-		if (node->prefetch_pages > 0)
-		{
-			/* The main iterator has closed the distance by one page */
-			node->prefetch_pages--;
-		}
-		else if (!tbm_exhausted(prefetch_iterator))
-		{
-			tbm_iterate(prefetch_iterator, &tbmpre);
-			node->prefetch_blockno = tbmpre.blockno;
-		}
-		return;
-	}
-
-	/*
-	 * XXX: There is a known issue with keeping the prefetch and current block
-	 * iterators in sync for parallel bitmap table scans. This can lead to
-	 * prefetching blocks that have already been read. See the discussion
-	 * here:
-	 * https://postgr.es/m/20240315211449.en2jcmdqxv5o6tlz%40alap3.anarazel.de
-	 * Note that moving the call site of BitmapAdjustPrefetchIterator()
-	 * exacerbates the effects of this bug.
-	 */
-	if (node->prefetch_maximum > 0)
-	{
-		TBMIterator *prefetch_iterator = &node->prefetch_iterator;
-
-		SpinLockAcquire(&pstate->mutex);
-		if (pstate->prefetch_pages > 0)
-		{
-			pstate->prefetch_pages--;
-			SpinLockRelease(&pstate->mutex);
-		}
-		else
-		{
-			/* Release the mutex before iterating */
-			SpinLockRelease(&pstate->mutex);
-
-			/*
-			 * In case of shared mode, we can not ensure that the current
-			 * blockno of the main iterator and that of the prefetch iterator
-			 * are same.  It's possible that whatever blockno we are
-			 * prefetching will be processed by another process.  Therefore,
-			 * we don't validate the blockno here as we do in non-parallel
-			 * case.
-			 */
-			if (!tbm_exhausted(prefetch_iterator))
-			{
-				tbm_iterate(prefetch_iterator, &tbmpre);
-				node->prefetch_blockno = tbmpre.blockno;
-			}
-		}
-	}
-#endif							/* USE_PREFETCH */
-}
-
-/*
- * BitmapAdjustPrefetchTarget - Adjust the prefetch target
- *
- * Increase prefetch target if it's not yet at the max.  Note that
- * we will increase it to zero after fetching the very first
- * page/tuple, then to one after the second tuple is fetched, then
- * it doubles as later pages are fetched.
- */
-static inline void
-BitmapAdjustPrefetchTarget(BitmapHeapScanState *node)
-{
-#ifdef USE_PREFETCH
-	ParallelBitmapHeapState *pstate = node->pstate;
-
-	if (pstate == NULL)
-	{
-		if (node->prefetch_target >= node->prefetch_maximum)
-			 /* don't increase any further */ ;
-		else if (node->prefetch_target >= node->prefetch_maximum / 2)
-			node->prefetch_target = node->prefetch_maximum;
-		else if (node->prefetch_target > 0)
-			node->prefetch_target *= 2;
-		else
-			node->prefetch_target++;
-		return;
-	}
-
-	/* Do an unlocked check first to save spinlock acquisitions. */
-	if (pstate->prefetch_target < node->prefetch_maximum)
-	{
-		SpinLockAcquire(&pstate->mutex);
-		if (pstate->prefetch_target >= node->prefetch_maximum)
-			 /* don't increase any further */ ;
-		else if (pstate->prefetch_target >= node->prefetch_maximum / 2)
-			pstate->prefetch_target = node->prefetch_maximum;
-		else if (pstate->prefetch_target > 0)
-			pstate->prefetch_target *= 2;
-		else
-			pstate->prefetch_target++;
-		SpinLockRelease(&pstate->mutex);
-	}
-#endif							/* USE_PREFETCH */
-}
-
-/*
- * BitmapPrefetch - Prefetch, if prefetch_pages are behind prefetch_target
- */
-static inline void
-BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
-{
-#ifdef USE_PREFETCH
-	ParallelBitmapHeapState *pstate = node->pstate;
-
-	if (pstate == NULL)
-	{
-		TBMIterator *prefetch_iterator = &node->prefetch_iterator;
-
-		if (!tbm_exhausted(prefetch_iterator))
-		{
-			while (node->prefetch_pages < node->prefetch_target)
-			{
-				TBMIterateResult tbmpre;
-				bool		skip_fetch;
-
-				if (!tbm_iterate(prefetch_iterator, &tbmpre))
-				{
-					/* No more pages to prefetch */
-					Assert(!BlockNumberIsValid(tbmpre.blockno));
-					tbm_end_iterate(prefetch_iterator);
-					break;
-				}
-				node->prefetch_pages++;
-				node->prefetch_blockno = tbmpre.blockno;
-
-				/*
-				 * If we expect not to have to actually read this heap page,
-				 * skip this prefetch call, but continue to run the prefetch
-				 * logic normally.  (Would it be better not to increment
-				 * prefetch_pages?)
-				 */
-				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
-							  !tbmpre.recheck &&
-							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
-											 tbmpre.blockno,
-											 &node->pvmbuffer));
-
-				if (!skip_fetch)
-					PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre.blockno);
-			}
-		}
-
-		return;
-	}
-
-	if (pstate->prefetch_pages < pstate->prefetch_target)
-	{
-		TBMIterator *prefetch_iterator = &node->prefetch_iterator;
-
-		if (!tbm_exhausted(prefetch_iterator))
-		{
-			while (1)
-			{
-				TBMIterateResult tbmpre;
-				bool		do_prefetch = false;
-				bool		skip_fetch;
-
-				/*
-				 * Recheck under the mutex. If some other process has already
-				 * done enough prefetching then we need not to do anything.
-				 */
-				SpinLockAcquire(&pstate->mutex);
-				if (pstate->prefetch_pages < pstate->prefetch_target)
-				{
-					pstate->prefetch_pages++;
-					do_prefetch = true;
-				}
-				SpinLockRelease(&pstate->mutex);
-
-				if (!do_prefetch)
-					return;
-
-				if (!tbm_iterate(prefetch_iterator, &tbmpre))
-				{
-					Assert(!BlockNumberIsValid(tbmpre.blockno));
-					/* No more pages to prefetch */
-					tbm_end_iterate(prefetch_iterator);
-					break;
-				}
-
-				node->prefetch_blockno = tbmpre.blockno;
-
-				/* As above, skip prefetch if we expect not to need page */
-				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
-							  !tbmpre.recheck &&
-							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
-											 tbmpre.blockno,
-											 &node->pvmbuffer));
-
-				if (!skip_fetch)
-					PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, tbmpre.blockno);
-			}
-		}
-	}
-#endif							/* USE_PREFETCH */
-}
-
 /*
  * BitmapHeapRecheck -- access method routine to recheck a tuple in EvalPlanQual
  */
@@ -580,24 +280,12 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 		table_rescan(node->ss.ss_currentScanDesc, NULL);
 	}
 
-	/* If we did not already clean up the prefetch iterator, do so now. */
-	if (!tbm_exhausted(&node->prefetch_iterator))
-		tbm_end_iterate(&node->prefetch_iterator);
-
 	/* release bitmaps and buffers if any */
 	if (node->tbm)
 		tbm_free(node->tbm);
-	if (node->pvmbuffer != InvalidBuffer)
-		ReleaseBuffer(node->pvmbuffer);
 	node->tbm = NULL;
 	node->initialized = false;
-	node->pvmbuffer = InvalidBuffer;
 	node->recheck = true;
-	/* Only used for serial BHS */
-	node->blockno = InvalidBlockNumber;
-	node->prefetch_blockno = InvalidBlockNumber;
-	node->prefetch_pages = 0;
-	node->prefetch_target = -1;
 
 	ExecScanReScan(&node->ss);
 
@@ -666,17 +354,11 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 		table_endscan(scanDesc);
 	}
 
-	/* If we did not already clean up the prefetch iterator, do so now. */
-	if (!tbm_exhausted(&node->prefetch_iterator))
-		tbm_end_iterate(&node->prefetch_iterator);
-
 	/*
 	 * release bitmaps and buffers if any
 	 */
 	if (node->tbm)
 		tbm_free(node->tbm);
-	if (node->pvmbuffer != InvalidBuffer)
-		ReleaseBuffer(node->pvmbuffer);
 }
 
 /* ----------------------------------------------------------------
@@ -709,18 +391,13 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.ExecProcNode = ExecBitmapHeapScan;
 
 	scanstate->tbm = NULL;
-	scanstate->pvmbuffer = InvalidBuffer;
 
 	/* Zero the statistics counters */
 	memset(&scanstate->stats, 0, sizeof(BitmapHeapScanInstrumentation));
 
-	scanstate->prefetch_pages = 0;
-	scanstate->prefetch_target = -1;
 	scanstate->initialized = false;
 	scanstate->pstate = NULL;
 	scanstate->recheck = true;
-	scanstate->blockno = InvalidBlockNumber;
-	scanstate->prefetch_blockno = InvalidBlockNumber;
 
 	/*
 	 * Miscellaneous initialization
@@ -760,13 +437,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->bitmapqualorig =
 		ExecInitQual(node->bitmapqualorig, (PlanState *) scanstate);
 
-	/*
-	 * Maximum number of prefetches for the tablespace if configured,
-	 * otherwise the current value of the effective_io_concurrency GUC.
-	 */
-	scanstate->prefetch_maximum =
-		get_tablespace_io_concurrency(currentRelation->rd_rel->reltablespace);
-
 	scanstate->ss.ss_currentRelation = currentRelation;
 
 	/*
@@ -870,12 +540,9 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
 		sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
 
 	pstate->tbmiterator = 0;
-	pstate->prefetch_iterator = 0;
 
 	/* Initialize the mutex */
 	SpinLockInit(&pstate->mutex);
-	pstate->prefetch_pages = 0;
-	pstate->prefetch_target = -1;
 	pstate->state = BM_INITIAL;
 
 	ConditionVariableInit(&pstate->cv);
@@ -912,17 +579,11 @@ ExecBitmapHeapReInitializeDSM(BitmapHeapScanState *node,
 		return;
 
 	pstate->state = BM_INITIAL;
-	pstate->prefetch_pages = 0;
-	pstate->prefetch_target = -1;
 
 	if (DsaPointerIsValid(pstate->tbmiterator))
 		tbm_free_shared_area(dsa, pstate->tbmiterator);
 
-	if (DsaPointerIsValid(pstate->prefetch_iterator))
-		tbm_free_shared_area(dsa, pstate->prefetch_iterator);
-
 	pstate->tbmiterator = InvalidDsaPointer;
-	pstate->prefetch_iterator = InvalidDsaPointer;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 131c050c15f..507d4ebe68f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -797,28 +797,12 @@ typedef struct TableAmRoutine
 	 * always need to be rechecked, but some non-lossy pages' tuples may also
 	 * require recheck.
 	 *
-	 * `blockno` is the current block and is set by the table AM. The table AM
-	 * is responsible for advancing the main iterator, but the bitmap table
-	 * scan code still advances the prefetch iterator. `blockno` is used by
-	 * bitmap table scan code to validate that the prefetch block stays ahead
-	 * of the current block.
-	 *
-	 * XXX: Currently this may only be implemented if the AM uses md.c as its
-	 * storage manager, and uses ItemPointer->ip_blkid in a manner that maps
-	 * blockids directly to the underlying storage. nodeBitmapHeapscan.c
-	 * performs prefetching directly using that interface.  This probably
-	 * needs to be rectified at a later point.
-	 *
-	 * XXX: Currently this may only be implemented if the AM uses the
-	 * visibilitymap, as nodeBitmapHeapscan.c unconditionally accesses it to
-	 * perform prefetching.  This probably needs to be rectified at a later
-	 * point.
+	 * Prefetching additional data from the bitmap is left to the table AM.
 	 *
 	 * Optional callback, but either both scan_bitmap_next_block and
 	 * scan_bitmap_next_tuple need to exist, or neither.
 	 */
 	bool		(*scan_bitmap_next_block) (TableScanDesc scan,
-										   BlockNumber *blockno,
 										   bool *recheck,
 										   uint64 *lossy_pages,
 										   uint64 *exact_pages);
@@ -1966,16 +1950,11 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths,
  * `recheck` is set by the table AM to indicate whether or not the tuples
  * from this block should be rechecked.
  *
- * `blockno` is the current block and is set by the table AM and is used by
- * bitmap table scan code to validate that the prefetch block stays ahead of
- * the current block.
- *
  * Note, this is an optionally implemented function, therefore should only be
  * used after verifying the presence (at plan time or such).
  */
 static inline bool
 table_scan_bitmap_next_block(TableScanDesc scan,
-							 BlockNumber *blockno,
 							 bool *recheck,
 							 uint64 *lossy_pages,
 							 uint64 *exact_pages)
@@ -1989,7 +1968,7 @@ table_scan_bitmap_next_block(TableScanDesc scan,
 		elog(ERROR, "unexpected table_scan_bitmap_next_block call during logical decoding");
 
 	return scan->rs_rd->rd_tableam->scan_bitmap_next_block(scan,
-														   blockno, recheck,
+														   recheck,
 														   lossy_pages,
 														   exact_pages);
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a323fa98bbb..5f743d6699c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1828,11 +1828,7 @@ typedef enum
 /* ----------------
  *	 ParallelBitmapHeapState information
  *		tbmiterator				iterator for scanning current pages
- *		prefetch_iterator		iterator for prefetching ahead of current page
- *		mutex					mutual exclusion for the prefetching variable
- *								and state
- *		prefetch_pages			# pages prefetch iterator is ahead of current
- *		prefetch_target			current target prefetch distance
+ *		mutex					mutual exclusion for state
  *		state					current state of the TIDBitmap
  *		cv						conditional wait variable
  * ----------------
@@ -1840,10 +1836,7 @@ typedef enum
 typedef struct ParallelBitmapHeapState
 {
 	dsa_pointer tbmiterator;
-	dsa_pointer prefetch_iterator;
 	slock_t		mutex;
-	int			prefetch_pages;
-	int			prefetch_target;
 	SharedBitmapState state;
 	ConditionVariable cv;
 } ParallelBitmapHeapState;
@@ -1867,18 +1860,11 @@ typedef struct SharedBitmapHeapInstrumentation
  *
  *		bitmapqualorig	   execution state for bitmapqualorig expressions
  *		tbm				   bitmap obtained from child index scan(s)
- *		pvmbuffer		   buffer for visibility-map lookups of prefetched pages
  *		stats			   execution statistics
- *		prefetch_iterator  iterator for prefetching ahead of current page
- *		prefetch_pages	   # pages prefetch iterator is ahead of current
- *		prefetch_target    current target prefetch distance
- *		prefetch_maximum   maximum value for prefetch_target
  *		initialized		   is node is ready to iterate
  *		pstate			   shared state for parallel bitmap scan
  *		sinstrument		   statistics for parallel workers
  *		recheck			   do current page's tuples need recheck
- *		blockno			   used to validate pf and current block stay in sync
- *		prefetch_blockno   used to validate pf stays ahead of current block
  * ----------------
  */
 typedef struct BitmapHeapScanState
@@ -1886,18 +1872,11 @@ typedef struct BitmapHeapScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	ExprState  *bitmapqualorig;
 	TIDBitmap  *tbm;
-	Buffer		pvmbuffer;
 	BitmapHeapScanInstrumentation stats;
-	TBMIterator prefetch_iterator;
-	int			prefetch_pages;
-	int			prefetch_target;
-	int			prefetch_maximum;
 	bool		initialized;
 	ParallelBitmapHeapState *pstate;
 	SharedBitmapHeapInstrumentation *sinstrument;
 	bool		recheck;
-	BlockNumber blockno;
-	BlockNumber prefetch_blockno;
 } BitmapHeapScanState;
 
 /* ----------------
-- 
2.34.1

From c3ae01de0ded48590995115065889f1613bf458f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 21 Feb 2025 10:58:59 -0500
Subject: [PATCH v33 1/5] Add lossy indicator to TBMIterateResult

TBMIterateResult->ntuples is -1 when the page in the bitmap is lossy.
Add an explicit lossy indicator so that we can move ntuples out of the
TBMIterateResult in a future commit.
---
 src/backend/access/gin/ginget.c          |  6 ++++--
 src/backend/access/heap/heapam_handler.c | 10 +++++-----
 src/backend/nodes/tidbitmap.c            | 11 ++++++++---
 src/include/nodes/tidbitmap.h            |  5 +++--
 4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 63dd1f3679f..54aecc1d1f1 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -827,7 +827,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 * in the bitmap.
 			 */
 			while (entry->matchResult == NULL ||
-				   (entry->matchResult->ntuples >= 0 &&
+				   (!entry->matchResult->lossy &&
 					entry->offset >= entry->matchResult->ntuples) ||
 				   entry->matchResult->blockno < advancePastBlk ||
 				   (ItemPointerIsLossyPage(&advancePast) &&
@@ -860,7 +860,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 * We're now on the first page after advancePast which has any
 			 * items on it. If it's a lossy result, return that.
 			 */
-			if (entry->matchResult->ntuples < 0)
+			if (entry->matchResult->lossy)
 			{
 				ItemPointerSetLossyPage(&entry->curItem,
 										entry->matchResult->blockno);
@@ -879,6 +879,8 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 */
 			if (entry->matchResult->blockno == advancePastBlk)
 			{
+				Assert(entry->matchResult->ntuples > 0);
+
 				/*
 				 * First, do a quick check against the last offset on the
 				 * page. If that's > advancePast, so are all the other
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c0bec014154..269d581c2ec 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2170,7 +2170,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		VM_ALL_VISIBLE(scan->rs_rd, tbmres->blockno, &bscan->rs_vmbuffer))
 	{
 		/* can't be lossy in the skip_fetch case */
-		Assert(tbmres->ntuples >= 0);
+		Assert(!tbmres->lossy);
 		Assert(bscan->rs_empty_tuples_pending >= 0);
 
 		bscan->rs_empty_tuples_pending += tbmres->ntuples;
@@ -2207,7 +2207,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	/*
 	 * We need two separate strategies for lossy and non-lossy cases.
 	 */
-	if (tbmres->ntuples >= 0)
+	if (!tbmres->lossy)
 	{
 		/*
 		 * Bitmap is non-lossy, so we just look through the offsets listed in
@@ -2268,10 +2268,10 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Assert(ntup <= MaxHeapTuplesPerPage);
 	hscan->rs_ntuples = ntup;
 
-	if (tbmres->ntuples >= 0)
-		(*exact_pages)++;
-	else
+	if (tbmres->lossy)
 		(*lossy_pages)++;
+	else
+		(*exact_pages)++;
 
 	/*
 	 * Return true to indicate that a valid block was found and the bitmap is
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 66b3c387d53..3e0bca651f5 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -961,12 +961,13 @@ tbm_advance_schunkbit(PagetableEntry *chunk, int *schunkbitp)
  *
  * Returns a TBMIterateResult representing one page, or NULL if there are
  * no more pages to scan.  Pages are guaranteed to be delivered in numerical
- * order.  If result->ntuples < 0, then the bitmap is "lossy" and failed to
+ * order.  If lossy is true, then the bitmap is "lossy" and failed to
  * remember the exact tuples to look at on this page --- the caller must
  * examine all tuples on the page and check if they meet the intended
- * condition.  If result->recheck is true, only the indicated tuples need
+ * condition.  result->ntuples is set to -1 when the bitmap is lossy.
+ * If result->recheck is true, only the indicated tuples need
  * be examined, but the condition must be rechecked anyway.  (For ease of
- * testing, recheck is always set true when ntuples < 0.)
+ * testing, recheck is always set true when lossy is true.)
  */
 TBMIterateResult *
 tbm_private_iterate(TBMPrivateIterator *iterator)
@@ -1012,6 +1013,7 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 			/* Return a lossy page indicator from the chunk */
 			output->blockno = chunk_blockno;
 			output->ntuples = -1;
+			output->lossy = true;
 			output->recheck = true;
 			iterator->schunkbit++;
 			return output;
@@ -1033,6 +1035,7 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 		ntuples = tbm_extract_page_tuple(page, output);
 		output->blockno = page->blockno;
 		output->ntuples = ntuples;
+		output->lossy = false;
 		output->recheck = page->recheck;
 		iterator->spageptr++;
 		return output;
@@ -1105,6 +1108,7 @@ tbm_shared_iterate(TBMSharedIterator *iterator)
 			/* Return a lossy page indicator from the chunk */
 			output->blockno = chunk_blockno;
 			output->ntuples = -1;
+			output->lossy = true;
 			output->recheck = true;
 			istate->schunkbit++;
 
@@ -1122,6 +1126,7 @@ tbm_shared_iterate(TBMSharedIterator *iterator)
 		ntuples = tbm_extract_page_tuple(page, output);
 		output->blockno = page->blockno;
 		output->ntuples = ntuples;
+		output->lossy = false;
 		output->recheck = page->recheck;
 		istate->spageptr++;
 
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index a6ffeac90be..8cd93d90a86 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -54,9 +54,10 @@ typedef struct TBMIterator
 typedef struct TBMIterateResult
 {
 	BlockNumber blockno;		/* page number containing tuples */
-	int			ntuples;		/* -1 indicates lossy result */
+	int			ntuples;		/* -1 when lossy */
+	bool		lossy;
 	bool		recheck;		/* should the tuples be rechecked? */
-	/* Note: recheck is always true if ntuples < 0 */
+	/* Note: recheck is always true if lossy */
 	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
 } TBMIterateResult;
 
-- 
2.34.1

From e364c7afa95819c67ccb101e9f35dad33f9f996b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 21 Feb 2025 11:02:45 -0500
Subject: [PATCH v33 2/5] Delay extraction of TIDBitmap per page offsets

Pages from the bitmap created by the TIDBitmap API can be exact or
lossy. Exact pages extract all the tuple offsets from the bitmap and put
them in an array for the convenience of the caller.

This was done in tbm_private|shared_iterate(). However, as long as
tbm_private|shared_iterate() set a reference to the PagetableEntry in
the TBMIterateResult, the offset extraction can be done later.

Waiting to extract the tuple offsets has a few benefits. For the shared
iterator case, it allows us to extract the offsets after dropping the
shared iterator state lock, reducing time spent holding a contended
lock.

Separating the iteration step and extracting the offsets later also
allows us to avoid extracting the offsets for prefetched blocks. Those
offsets were never used, so the overhead of extracting and storing them
was wasted.

The real motivation for this change, however, is that future commits
will make bitmap heap scan use the read stream API. This requires a
TBMIterateResult per issued block. By removing the array of tuple
offsets from the TBMIterateResult and only extracting the offsets when
they are used, we reduce the memory required for per buffer data
substantially.

Suggested-by: Thomas Munro <thomas.mu...@gmail.com>
Reviewed-by: Thomas Munro <thomas.mu...@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
---
 src/backend/access/gin/ginget.c          | 28 ++++++++----
 src/backend/access/heap/heapam_handler.c | 17 +++++--
 src/backend/nodes/tidbitmap.c            | 57 ++++++++++--------------
 src/include/access/gin_private.h         |  2 +
 src/include/nodes/tidbitmap.h            | 32 ++++++++++---
 5 files changed, 86 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 54aecc1d1f1..f3b19d280c3 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -333,6 +333,7 @@ restartScanEntry:
 	entry->nlist = 0;
 	entry->matchBitmap = NULL;
 	entry->matchResult = NULL;
+	entry->matchNtuples = -1;
 	entry->reduceResult = false;
 	entry->predictNumberResult = 0;
 
@@ -828,7 +829,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			 */
 			while (entry->matchResult == NULL ||
 				   (!entry->matchResult->lossy &&
-					entry->offset >= entry->matchResult->ntuples) ||
+					entry->offset >= entry->matchNtuples) ||
 				   entry->matchResult->blockno < advancePastBlk ||
 				   (ItemPointerIsLossyPage(&advancePast) &&
 					entry->matchResult->blockno == advancePastBlk))
@@ -845,9 +846,15 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 					break;
 				}
 
+				/* Exact pages need their tuple offsets extracted. */
+				if (!entry->matchResult->lossy)
+					entry->matchNtuples = tbm_extract_page_tuple(entry->matchResult,
+																 entry->matchOffsets,
+																 TBM_MAX_TUPLES_PER_PAGE);
+
 				/*
 				 * Reset counter to the beginning of entry->matchResult. Note:
-				 * entry->offset is still greater than matchResult->ntuples if
+				 * entry->offset is still greater than entry->matchNtuples if
 				 * matchResult is lossy.  So, on next call we will get next
 				 * result from TIDBitmap.
 				 */
@@ -874,32 +881,35 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			}
 
 			/*
-			 * Not a lossy page. Skip over any offsets <= advancePast, and
-			 * return that.
+			 * Not a lossy page. If tuple offsets were extracted,
+			 * entry->matchNtuples must be > -1
 			 */
+			Assert(entry->matchNtuples > -1);
+
+			/* Skip over any offsets <= advancePast, and return that. */
 			if (entry->matchResult->blockno == advancePastBlk)
 			{
-				Assert(entry->matchResult->ntuples > 0);
+				Assert(entry->matchNtuples > 0);
 
 				/*
 				 * First, do a quick check against the last offset on the
 				 * page. If that's > advancePast, so are all the other
 				 * offsets, so just go back to the top to get the next page.
 				 */
-				if (entry->matchResult->offsets[entry->matchResult->ntuples - 1] <= advancePastOff)
+				if (entry->matchOffsets[entry->matchNtuples - 1] <= advancePastOff)
 				{
-					entry->offset = entry->matchResult->ntuples;
+					entry->offset = entry->matchNtuples;
 					continue;
 				}
 
 				/* Otherwise scan to find the first item > advancePast */
-				while (entry->matchResult->offsets[entry->offset] <= advancePastOff)
+				while (entry->matchOffsets[entry->offset] <= advancePastOff)
 					entry->offset++;
 			}
 
 			ItemPointerSet(&entry->curItem,
 						   entry->matchResult->blockno,
-						   entry->matchResult->offsets[entry->offset]);
+						   entry->matchOffsets[entry->offset]);
 			entry->offset++;
 
 			/* Done unless we need to reduce the result */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 269d581c2ec..e78682c3cef 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2127,6 +2127,8 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Snapshot	snapshot;
 	int			ntup;
 	TBMIterateResult *tbmres;
+	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
+	int			noffsets = -1;
 
 	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
 
@@ -2145,6 +2147,11 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		if (tbmres == NULL)
 			return false;
 
+		/* Exact pages need their tuple offsets extracted. */
+		if (!tbmres->lossy)
+			noffsets = tbm_extract_page_tuple(tbmres, offsets,
+											  TBM_MAX_TUPLES_PER_PAGE);
+
 		/*
 		 * Ignore any claimed entries past what we think is the end of the
 		 * relation. It may have been extended after the start of our scan (we
@@ -2172,8 +2179,9 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		/* can't be lossy in the skip_fetch case */
 		Assert(!tbmres->lossy);
 		Assert(bscan->rs_empty_tuples_pending >= 0);
+		Assert(noffsets > -1);
 
-		bscan->rs_empty_tuples_pending += tbmres->ntuples;
+		bscan->rs_empty_tuples_pending += noffsets;
 
 		return true;
 	}
@@ -2216,9 +2224,12 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		 */
 		int			curslot;
 
-		for (curslot = 0; curslot < tbmres->ntuples; curslot++)
+		/* We must have extracted the tuple offsets by now */
+		Assert(noffsets > -1);
+
+		for (curslot = 0; curslot < noffsets; curslot++)
 		{
-			OffsetNumber offnum = tbmres->offsets[curslot];
+			OffsetNumber offnum = offsets[curslot];
 			ItemPointerData tid;
 			HeapTupleData heapTuple;
 
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 3e0bca651f5..3d835024caa 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -40,7 +40,6 @@
 
 #include <limits.h>
 
-#include "access/htup_details.h"
 #include "common/hashfn.h"
 #include "common/int.h"
 #include "nodes/bitmapset.h"
@@ -48,14 +47,6 @@
 #include "storage/lwlock.h"
 #include "utils/dsa.h"
 
-/*
- * The maximum number of tuples per page is not large (typically 256 with
- * 8K pages, or 1024 with 32K pages).  So there's not much point in making
- * the per-page bitmaps variable size.  We just legislate that the size
- * is this:
- */
-#define MAX_TUPLES_PER_PAGE  MaxHeapTuplesPerPage
-
 /*
  * When we have to switch over to lossy storage, we use a data structure
  * with one bit per page, where all pages having the same number DIV
@@ -67,7 +58,7 @@
  * table, using identical data structures.  (This is because the memory
  * management for hashtables doesn't easily/efficiently allow space to be
  * transferred easily from one hashtable to another.)  Therefore it's best
- * if PAGES_PER_CHUNK is the same as MAX_TUPLES_PER_PAGE, or at least not
+ * if PAGES_PER_CHUNK is the same as TBM_MAX_TUPLES_PER_PAGE, or at least not
  * too different.  But we also want PAGES_PER_CHUNK to be a power of 2 to
  * avoid expensive integer remainder operations.  So, define it like this:
  */
@@ -79,7 +70,7 @@
 #define BITNUM(x)	((x) % BITS_PER_BITMAPWORD)
 
 /* number of active words for an exact page: */
-#define WORDS_PER_PAGE	((MAX_TUPLES_PER_PAGE - 1) / BITS_PER_BITMAPWORD + 1)
+#define WORDS_PER_PAGE	((TBM_MAX_TUPLES_PER_PAGE - 1) / BITS_PER_BITMAPWORD + 1)
 /* number of active words for a lossy chunk: */
 #define WORDS_PER_CHUNK  ((PAGES_PER_CHUNK - 1) / BITS_PER_BITMAPWORD + 1)
 
@@ -181,7 +172,7 @@ struct TBMPrivateIterator
 	int			spageptr;		/* next spages index */
 	int			schunkptr;		/* next schunks index */
 	int			schunkbit;		/* next bit to check in current schunk */
-	TBMIterateResult output;	/* MUST BE LAST (because variable-size) */
+	TBMIterateResult output;
 };
 
 /*
@@ -222,7 +213,7 @@ struct TBMSharedIterator
 	PTEntryArray *ptbase;		/* pagetable element array */
 	PTIterationArray *ptpages;	/* sorted exact page index list */
 	PTIterationArray *ptchunks; /* sorted lossy page index list */
-	TBMIterateResult output;	/* MUST BE LAST (because variable-size) */
+	TBMIterateResult output;
 };
 
 /* Local function prototypes */
@@ -390,7 +381,7 @@ tbm_add_tuples(TIDBitmap *tbm, const ItemPointer tids, int ntids,
 					bitnum;
 
 		/* safety check to ensure we don't overrun bit array bounds */
-		if (off < 1 || off > MAX_TUPLES_PER_PAGE)
+		if (off < 1 || off > TBM_MAX_TUPLES_PER_PAGE)
 			elog(ERROR, "tuple offset out of range: %u", off);
 
 		/*
@@ -696,9 +687,7 @@ tbm_begin_private_iterate(TIDBitmap *tbm)
 	 * Create the TBMPrivateIterator struct, with enough trailing space to
 	 * serve the needs of the TBMIterateResult sub-struct.
 	 */
-	iterator = (TBMPrivateIterator *) palloc(sizeof(TBMPrivateIterator) +
-											 MAX_TUPLES_PER_PAGE *
-											 sizeof(OffsetNumber));
+	iterator = (TBMPrivateIterator *) palloc(sizeof(TBMPrivateIterator));
 	iterator->tbm = tbm;
 
 	/*
@@ -906,11 +895,16 @@ tbm_prepare_shared_iterate(TIDBitmap *tbm)
 /*
  * tbm_extract_page_tuple - extract the tuple offsets from a page
  *
- * The extracted offsets are stored into TBMIterateResult.
+ * Returns the number of offsets it filled in if <= max_offsets. Otherwise,
+ * fills in as many offsets as fit and returns the total number of offsets in
+ * the page.
  */
-static inline int
-tbm_extract_page_tuple(PagetableEntry *page, TBMIterateResult *output)
+int
+tbm_extract_page_tuple(TBMIterateResult *iteritem,
+					   OffsetNumber *offsets,
+					   uint32 max_offsets)
 {
+	PagetableEntry *page = iteritem->internal_page;
 	int			wordnum;
 	int			ntuples = 0;
 
@@ -925,7 +919,11 @@ tbm_extract_page_tuple(PagetableEntry *page, TBMIterateResult *output)
 			while (w != 0)
 			{
 				if (w & 1)
-					output->offsets[ntuples++] = (OffsetNumber) off;
+				{
+					if (ntuples < max_offsets)
+						offsets[ntuples] = (OffsetNumber) off;
+					ntuples++;
+				}
 				off++;
 				w >>= 1;
 			}
@@ -1012,9 +1010,9 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 		{
 			/* Return a lossy page indicator from the chunk */
 			output->blockno = chunk_blockno;
-			output->ntuples = -1;
 			output->lossy = true;
 			output->recheck = true;
+			output->internal_page = NULL;
 			iterator->schunkbit++;
 			return output;
 		}
@@ -1023,7 +1021,6 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 	if (iterator->spageptr < tbm->npages)
 	{
 		PagetableEntry *page;
-		int			ntuples;
 
 		/* In TBM_ONE_PAGE state, we don't allocate an spages[] array */
 		if (tbm->status == TBM_ONE_PAGE)
@@ -1031,10 +1028,8 @@ tbm_private_iterate(TBMPrivateIterator *iterator)
 		else
 			page = tbm->spages[iterator->spageptr];
 
-		/* scan bitmap to extract individual offset numbers */
-		ntuples = tbm_extract_page_tuple(page, output);
+		output->internal_page = page;
 		output->blockno = page->blockno;
-		output->ntuples = ntuples;
 		output->lossy = false;
 		output->recheck = page->recheck;
 		iterator->spageptr++;
@@ -1107,9 +1102,9 @@ tbm_shared_iterate(TBMSharedIterator *iterator)
 		{
 			/* Return a lossy page indicator from the chunk */
 			output->blockno = chunk_blockno;
-			output->ntuples = -1;
 			output->lossy = true;
 			output->recheck = true;
+			output->internal_page = NULL;
 			istate->schunkbit++;
 
 			LWLockRelease(&istate->lock);
@@ -1120,12 +1115,9 @@ tbm_shared_iterate(TBMSharedIterator *iterator)
 	if (istate->spageptr < istate->npages)
 	{
 		PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
-		int			ntuples;
 
-		/* scan bitmap to extract individual offset numbers */
-		ntuples = tbm_extract_page_tuple(page, output);
+		output->internal_page = page;
 		output->blockno = page->blockno;
-		output->ntuples = ntuples;
 		output->lossy = false;
 		output->recheck = page->recheck;
 		istate->spageptr++;
@@ -1473,8 +1465,7 @@ tbm_attach_shared_iterate(dsa_area *dsa, dsa_pointer dp)
 	 * Create the TBMSharedIterator struct, with enough trailing space to
 	 * serve the needs of the TBMIterateResult sub-struct.
 	 */
-	iterator = (TBMSharedIterator *) palloc0(sizeof(TBMSharedIterator) +
-											 MAX_TUPLES_PER_PAGE * sizeof(OffsetNumber));
+	iterator = (TBMSharedIterator *) palloc0(sizeof(TBMSharedIterator));
 
 	istate = (TBMSharedIteratorState *) dsa_get_address(dsa, dp);
 
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index dcd1ae3fc34..50478db9820 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -354,6 +354,8 @@ typedef struct GinScanEntryData
 	TIDBitmap  *matchBitmap;
 	TBMPrivateIterator *matchIterator;
 	TBMIterateResult *matchResult;
+	OffsetNumber matchOffsets[TBM_MAX_TUPLES_PER_PAGE];
+	int			matchNtuples;
 
 	/* used for Posting list and one page in Posting tree */
 	ItemPointerData *list;
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index 8cd93d90a86..e185635c10b 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -22,9 +22,17 @@
 #ifndef TIDBITMAP_H
 #define TIDBITMAP_H
 
+#include "access/htup_details.h"
 #include "storage/itemptr.h"
 #include "utils/dsa.h"
 
+/*
+ * The maximum number of tuples per page is not large (typically 256 with
+ * 8K pages, or 1024 with 32K pages).  So there's not much point in making
+ * the per-page bitmaps variable size.  We just legislate that the size
+ * is this:
+ */
+#define TBM_MAX_TUPLES_PER_PAGE  MaxHeapTuplesPerPage
 
 /*
  * Actual bitmap representation is private to tidbitmap.c.  Callers can
@@ -53,12 +61,22 @@ typedef struct TBMIterator
 /* Result structure for tbm_iterate */
 typedef struct TBMIterateResult
 {
-	BlockNumber blockno;		/* page number containing tuples */
-	int			ntuples;		/* -1 when lossy */
+	BlockNumber blockno;		/* block number containing tuples */
+
 	bool		lossy;
-	bool		recheck;		/* should the tuples be rechecked? */
-	/* Note: recheck is always true if lossy */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+
+	/*
+	 * Whether or not the tuples should be rechecked. This is always true if
+	 * the page is lossy but may also be true if the query requires recheck.
+	 */
+	bool		recheck;
+
+	/*
+	 * Pointer to the page containing the bitmap for this block. It is a void *
+	 * to avoid exposing the details of the tidbitmap PagetableEntry to API
+	 * users.
+	 */
+	void	   *internal_page;
 } TBMIterateResult;
 
 /* function prototypes in nodes/tidbitmap.c */
@@ -75,6 +93,10 @@ extern void tbm_add_page(TIDBitmap *tbm, BlockNumber pageno);
 extern void tbm_union(TIDBitmap *a, const TIDBitmap *b);
 extern void tbm_intersect(TIDBitmap *a, const TIDBitmap *b);
 
+extern int	tbm_extract_page_tuple(TBMIterateResult *iteritem,
+								   OffsetNumber *offsets,
+								   uint32 max_offsets);
+
 extern bool tbm_is_empty(const TIDBitmap *tbm);
 
 extern TBMPrivateIterator *tbm_begin_private_iterate(TIDBitmap *tbm);
-- 
2.34.1

From 3b8f68ee9705714e9e8977292b53cbd2f7269478 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 21 Feb 2025 15:32:33 -0500
Subject: [PATCH v33 5/5] Remove table AM callback scan_bitmap_next_block

After pushing the bitmap iterator into table-AM specific code (as part
of making bitmap heap scan use the read stream API),
scan_bitmap_next_block() no longer returns the current block number.
Since scan_bitmap_next_block() isn't returning any relevant information
to bitmap table scan code, it makes more sense to get rid of it.

Now, bitmap table scan code only calls table_scan_bitmap_next_tuple(),
and the heap AM implementation of scan_bitmap_next_block() is a local
helper in heapam_handler.c.

Reviewed-by: Tomas Vondra <to...@vondra.me>
Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
---
 src/backend/access/heap/heapam_handler.c  | 386 ++++++++++++----------
 src/backend/access/table/tableamapi.c     |   3 -
 src/backend/executor/nodeBitmapHeapscan.c |  73 ++--
 src/backend/optimizer/util/plancat.c      |   2 +-
 src/include/access/tableam.h              |  90 ++---
 5 files changed, 251 insertions(+), 303 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4734283a9a9..8fbf5adca8b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -56,6 +56,10 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
 
 static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan);
 
+static bool BitmapHeapScanNextBlock(TableScanDesc scan,
+									bool *recheck,
+									uint64 *lossy_pages, uint64 *exact_pages);
+
 
 /* ------------------------------------------------------------------------
  * Slot related callbacks for heap AM
@@ -2116,205 +2120,44 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
  */
 
 static bool
-heapam_scan_bitmap_next_block(TableScanDesc scan,
+heapam_scan_bitmap_next_tuple(TableScanDesc scan,
+							  TupleTableSlot *slot,
 							  bool *recheck,
-							  uint64 *lossy_pages, uint64 *exact_pages)
+							  uint64 *lossy_pages,
+							  uint64 *exact_pages)
 {
 	BitmapHeapScanDesc bscan = (BitmapHeapScanDesc) scan;
 	HeapScanDesc hscan = (HeapScanDesc) bscan;
-	BlockNumber block;
-	void	   *per_buffer_data;
-	Buffer		buffer;
-	Snapshot	snapshot;
-	int			ntup;
-	TBMIterateResult *tbmres;
-	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
-	int			noffsets = -1;
-
-	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
-	Assert(hscan->rs_read_stream);
-
-	hscan->rs_cindex = 0;
-	hscan->rs_ntuples = 0;
-
-	/* Release buffer containing previous block. */
-	if (BufferIsValid(hscan->rs_cbuf))
-	{
-		ReleaseBuffer(hscan->rs_cbuf);
-		hscan->rs_cbuf = InvalidBuffer;
-	}
-
-	hscan->rs_cbuf = read_stream_next_buffer(hscan->rs_read_stream,
-											 &per_buffer_data);
-
-	if (BufferIsInvalid(hscan->rs_cbuf))
-	{
-		if (BufferIsValid(bscan->rs_vmbuffer))
-		{
-			ReleaseBuffer(bscan->rs_vmbuffer);
-			bscan->rs_vmbuffer = InvalidBuffer;
-		}
-
-		/*
-		 * Bitmap is exhausted. Time to emit empty tuples if relevant. We emit
-		 * all empty tuples at the end instead of emitting them per block we
-		 * skip fetching. This is necessary because the streaming read API
-		 * will only return TBMIterateResults for blocks actually fetched.
-		 * When we skip fetching a block, we keep track of how many empty
-		 * tuples to emit at the end of the BitmapHeapScan. We do not recheck
-		 * all NULL tuples.
-		 */
-		*recheck = false;
-		return bscan->rs_empty_tuples_pending > 0;
-	}
-
-	Assert(per_buffer_data);
-
-	tbmres = per_buffer_data;
-
-	Assert(BlockNumberIsValid(tbmres->blockno));
-	Assert(BufferGetBlockNumber(hscan->rs_cbuf) == tbmres->blockno);
-
-	/*
-	 * Exact pages need their tuple offsets extracted.
-	 * tbm_extract_page_tuple() will set offsets.ntuples to the correct
-	 * number.
-	 */
-	if (!tbmres->lossy)
-		noffsets = tbm_extract_page_tuple(tbmres, offsets,
-										  TBM_MAX_TUPLES_PER_PAGE);
-
-	*recheck = tbmres->recheck;
-
-	hscan->rs_cblock = tbmres->blockno;
-	hscan->rs_ntuples = noffsets;
-
-	block = tbmres->blockno;
-	buffer = hscan->rs_cbuf;
-	snapshot = scan->rs_snapshot;
-
-	ntup = 0;
-
-	/*
-	 * Prune and repair fragmentation for the whole page, if possible.
-	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
-
-	/*
-	 * We must hold share lock on the buffer content while examining tuple
-	 * visibility.  Afterwards, however, the tuples we have found to be
-	 * visible are guaranteed good as long as we hold the buffer pin.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_SHARE);
+	OffsetNumber targoffset;
+	Page		page;
+	ItemId		lp;
 
 	/*
-	 * We need two separate strategies for lossy and non-lossy cases.
+	 * Out of range?  If so, nothing more to look at on this page
 	 */
-	if (!tbmres->lossy)
+	while (hscan->rs_cindex >= hscan->rs_ntuples)
 	{
 		/*
-		 * Bitmap is non-lossy, so we just look through the offsets listed in
-		 * tbmres; but we have to follow any HOT chain starting at each such
-		 * offset.
+		 * Emit empty tuples before advancing to the next block
 		 */
-		int			curslot;
-
-		/* We must have extracted the tuple offsets by now */
-		Assert(noffsets > -1);
-
-		for (curslot = 0; curslot < noffsets; curslot++)
+		if (bscan->rs_empty_tuples_pending > 0)
 		{
-			OffsetNumber offnum = offsets[curslot];
-			ItemPointerData tid;
-			HeapTupleData heapTuple;
-
-			ItemPointerSet(&tid, block, offnum);
-			if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
-									   &heapTuple, NULL, true))
-				hscan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
-		}
-	}
-	else
-	{
-		/*
-		 * Bitmap is lossy, so we must examine each line pointer on the page.
-		 * But we can ignore HOT chains, since we'll check each tuple anyway.
-		 */
-		Page		page = BufferGetPage(buffer);
-		OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
-		OffsetNumber offnum;
-
-		for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
-		{
-			ItemId		lp;
-			HeapTupleData loctup;
-			bool		valid;
-
-			lp = PageGetItemId(page, offnum);
-			if (!ItemIdIsNormal(lp))
-				continue;
-			loctup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
-			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_rd->rd_id;
-			ItemPointerSet(&loctup.t_self, block, offnum);
-			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
-			if (valid)
-			{
-				hscan->rs_vistuples[ntup++] = offnum;
-				PredicateLockTID(scan->rs_rd, &loctup.t_self, snapshot,
-								 HeapTupleHeaderGetXmin(loctup.t_data));
-			}
-			HeapCheckForSerializableConflictOut(valid, scan->rs_rd, &loctup,
-												buffer, snapshot);
+			/*
+			 * If we don't have to fetch the tuple, just return nulls.
+			 */
+			ExecStoreAllNullTuple(slot);
+			bscan->rs_empty_tuples_pending--;
+			return true;
 		}
-	}
-
-	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-
-	Assert(ntup <= MaxHeapTuplesPerPage);
-	hscan->rs_ntuples = ntup;
-
-	if (tbmres->lossy)
-		(*lossy_pages)++;
-	else
-		(*exact_pages)++;
 
-	/*
-	 * Return true to indicate that a valid block was found and the bitmap is
-	 * not exhausted. If there are no visible tuples on this page,
-	 * hscan->rs_ntuples will be 0 and heapam_scan_bitmap_next_tuple() will
-	 * return false returning control to this function to advance to the next
-	 * block in the bitmap.
-	 */
-	return true;
-}
-
-static bool
-heapam_scan_bitmap_next_tuple(TableScanDesc scan,
-							  TupleTableSlot *slot)
-{
-	BitmapHeapScanDesc bscan = (BitmapHeapScanDesc) scan;
-	HeapScanDesc hscan = (HeapScanDesc) bscan;
-	OffsetNumber targoffset;
-	Page		page;
-	ItemId		lp;
-
-	if (bscan->rs_empty_tuples_pending > 0)
-	{
 		/*
-		 * If we don't have to fetch the tuple, just return nulls.
+		 * Returns false if the bitmap is exhausted and there are no further
+		 * blocks we need to scan.
 		 */
-		ExecStoreAllNullTuple(slot);
-		bscan->rs_empty_tuples_pending--;
-		return true;
+		if (!BitmapHeapScanNextBlock(scan, recheck, lossy_pages, exact_pages))
+			return false;
 	}
 
-	/*
-	 * Out of range?  If so, nothing more to look at on this page
-	 */
-	if (hscan->rs_cindex >= hscan->rs_ntuples)
-		return false;
-
 	targoffset = hscan->rs_vistuples[hscan->rs_cindex];
 	page = BufferGetPage(hscan->rs_cbuf);
 	lp = PageGetItemId(page, targoffset);
@@ -2621,6 +2464,184 @@ SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer,
 	}
 }
 
+/*
+ * Helper function get the next block of a bitmap heap scan. Returns true when
+ * it got the next block and saved it in the scan descriptor and false when
+ * the bitmap and or relation are exhausted.
+ */
+static bool
+BitmapHeapScanNextBlock(TableScanDesc scan,
+						bool *recheck,
+						uint64 *lossy_pages, uint64 *exact_pages)
+{
+	BitmapHeapScanDesc bscan = (BitmapHeapScanDesc) scan;
+	HeapScanDesc hscan = (HeapScanDesc) bscan;
+	BlockNumber block;
+	void	   *per_buffer_data;
+	Buffer		buffer;
+	Snapshot	snapshot;
+	int			ntup;
+	TBMIterateResult *tbmres;
+	OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
+	int			noffsets = -1;
+
+	Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
+	Assert(hscan->rs_read_stream);
+
+	hscan->rs_cindex = 0;
+	hscan->rs_ntuples = 0;
+
+	/* Release buffer containing previous block. */
+	if (BufferIsValid(hscan->rs_cbuf))
+	{
+		ReleaseBuffer(hscan->rs_cbuf);
+		hscan->rs_cbuf = InvalidBuffer;
+	}
+
+	hscan->rs_cbuf = read_stream_next_buffer(hscan->rs_read_stream,
+											 &per_buffer_data);
+
+	if (BufferIsInvalid(hscan->rs_cbuf))
+	{
+		if (BufferIsValid(bscan->rs_vmbuffer))
+		{
+			ReleaseBuffer(bscan->rs_vmbuffer);
+			bscan->rs_vmbuffer = InvalidBuffer;
+		}
+
+		/*
+		 * Bitmap is exhausted. Time to emit empty tuples if relevant. We emit
+		 * all empty tuples at the end instead of emitting them per block we
+		 * skip fetching. This is necessary because the streaming read API
+		 * will only return TBMIterateResults for blocks actually fetched.
+		 * When we skip fetching a block, we keep track of how many empty
+		 * tuples to emit at the end of the BitmapHeapScan. We do not recheck
+		 * all NULL tuples.
+		 */
+		*recheck = false;
+		return bscan->rs_empty_tuples_pending > 0;
+	}
+
+	Assert(per_buffer_data);
+
+	tbmres = per_buffer_data;
+
+	Assert(BlockNumberIsValid(tbmres->blockno));
+	Assert(BufferGetBlockNumber(hscan->rs_cbuf) == tbmres->blockno);
+
+	/*
+	 * Exact pages need their tuple offsets extracted.
+	 * tbm_extract_page_tuple() will set offsets.ntuples to the correct
+	 * number.
+	 */
+	if (!tbmres->lossy)
+		noffsets = tbm_extract_page_tuple(tbmres, offsets,
+										  TBM_MAX_TUPLES_PER_PAGE);
+
+	*recheck = tbmres->recheck;
+
+	hscan->rs_cblock = tbmres->blockno;
+	hscan->rs_ntuples = noffsets;
+
+	block = tbmres->blockno;
+	buffer = hscan->rs_cbuf;
+	snapshot = scan->rs_snapshot;
+
+	ntup = 0;
+
+	/*
+	 * Prune and repair fragmentation for the whole page, if possible.
+	 */
+	heap_page_prune_opt(scan->rs_rd, buffer);
+
+	/*
+	 * We must hold share lock on the buffer content while examining tuple
+	 * visibility.  Afterwards, however, the tuples we have found to be
+	 * visible are guaranteed good as long as we hold the buffer pin.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	/*
+	 * We need two separate strategies for lossy and non-lossy cases.
+	 */
+	if (!tbmres->lossy)
+	{
+		/*
+		 * Bitmap is non-lossy, so we just look through the offsets listed in
+		 * tbmres; but we have to follow any HOT chain starting at each such
+		 * offset.
+		 */
+		int			curslot;
+
+		/* We must have extracted the tuple offsets by now */
+		Assert(noffsets > -1);
+
+		for (curslot = 0; curslot < noffsets; curslot++)
+		{
+			OffsetNumber offnum = offsets[curslot];
+			ItemPointerData tid;
+			HeapTupleData heapTuple;
+
+			ItemPointerSet(&tid, block, offnum);
+			if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
+									   &heapTuple, NULL, true))
+				hscan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
+		}
+	}
+	else
+	{
+		/*
+		 * Bitmap is lossy, so we must examine each line pointer on the page.
+		 * But we can ignore HOT chains, since we'll check each tuple anyway.
+		 */
+		Page		page = BufferGetPage(buffer);
+		OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+		OffsetNumber offnum;
+
+		for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
+		{
+			ItemId		lp;
+			HeapTupleData loctup;
+			bool		valid;
+
+			lp = PageGetItemId(page, offnum);
+			if (!ItemIdIsNormal(lp))
+				continue;
+			loctup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+			loctup.t_len = ItemIdGetLength(lp);
+			loctup.t_tableOid = scan->rs_rd->rd_id;
+			ItemPointerSet(&loctup.t_self, block, offnum);
+			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+			if (valid)
+			{
+				hscan->rs_vistuples[ntup++] = offnum;
+				PredicateLockTID(scan->rs_rd, &loctup.t_self, snapshot,
+								 HeapTupleHeaderGetXmin(loctup.t_data));
+			}
+			HeapCheckForSerializableConflictOut(valid, scan->rs_rd, &loctup,
+												buffer, snapshot);
+		}
+	}
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	Assert(ntup <= MaxHeapTuplesPerPage);
+	hscan->rs_ntuples = ntup;
+
+	if (tbmres->lossy)
+		(*lossy_pages)++;
+	else
+		(*exact_pages)++;
+
+	/*
+	 * Return true to indicate that a valid block was found and the bitmap is
+	 * not exhausted. If there are no visible tuples on this page,
+	 * hscan->rs_ntuples will be 0 and heapam_scan_bitmap_next_tuple() will
+	 * return false returning control to this function to advance to the next
+	 * block in the bitmap.
+	 */
+	return true;
+}
 
 /* ------------------------------------------------------------------------
  * Definition of the heap table access method.
@@ -2680,7 +2701,6 @@ static const TableAmRoutine heapam_methods = {
 
 	.relation_estimate_size = heapam_estimate_rel_size,
 
-	.scan_bitmap_next_block = heapam_scan_bitmap_next_block,
 	.scan_bitmap_next_tuple = heapam_scan_bitmap_next_tuple,
 	.scan_sample_next_block = heapam_scan_sample_next_block,
 	.scan_sample_next_tuple = heapam_scan_sample_next_tuple
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 760a36fd2a1..476663b66aa 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -91,9 +91,6 @@ GetTableAmRoutine(Oid amhandler)
 
 	Assert(routine->relation_estimate_size != NULL);
 
-	/* optional, but one callback implies presence of the other */
-	Assert((routine->scan_bitmap_next_block == NULL) ==
-		   (routine->scan_bitmap_next_tuple == NULL));
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 6df34094a13..3e33360c0fc 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -138,69 +138,44 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 static TupleTableSlot *
 BitmapHeapNext(BitmapHeapScanState *node)
 {
-	ExprContext *econtext;
-	TableScanDesc scan;
-	TupleTableSlot *slot;
-
-	/*
-	 * extract necessary information from index scan node
-	 */
-	econtext = node->ss.ps.ps_ExprContext;
-	slot = node->ss.ss_ScanTupleSlot;
-	scan = node->ss.ss_currentScanDesc;
+	ExprContext *econtext = node->ss.ps.ps_ExprContext;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 
 	/*
 	 * If we haven't yet performed the underlying index scan, do it, and begin
 	 * the iteration over the bitmap.
 	 */
 	if (!node->initialized)
-	{
 		BitmapTableScanSetup(node);
-		scan = node->ss.ss_currentScanDesc;
-		goto new_page;
-	}
 
-	for (;;)
+	while (table_scan_bitmap_next_tuple(node->ss.ss_currentScanDesc,
+										slot, &node->recheck,
+										&node->stats.lossy_pages,
+										&node->stats.exact_pages))
 	{
-		while (table_scan_bitmap_next_tuple(scan, slot))
-		{
-			/*
-			 * Continuing in previously obtained page.
-			 */
-
-			CHECK_FOR_INTERRUPTS();
+		/*
+		 * Continuing in previously obtained page.
+		 */
+		CHECK_FOR_INTERRUPTS();
 
-			/*
-			 * If we are using lossy info, we have to recheck the qual
-			 * conditions at every tuple.
-			 */
-			if (node->recheck)
+		/*
+		 * If we are using lossy info, we have to recheck the qual conditions
+		 * at every tuple.
+		 */
+		if (node->recheck)
+		{
+			econtext->ecxt_scantuple = slot;
+			if (!ExecQualAndReset(node->bitmapqualorig, econtext))
 			{
-				econtext->ecxt_scantuple = slot;
-				if (!ExecQualAndReset(node->bitmapqualorig, econtext))
-				{
-					/* Fails recheck, so drop it and loop back for another */
-					InstrCountFiltered2(node, 1);
-					ExecClearTuple(slot);
-					continue;
-				}
+				/* Fails recheck, so drop it and loop back for another */
+				InstrCountFiltered2(node, 1);
+				ExecClearTuple(slot);
+				continue;
 			}
-
-			/* OK to return this tuple */
-			return slot;
 		}
 
-new_page:
-
-		/*
-		 * Returns false if the bitmap is exhausted and there are no further
-		 * blocks we need to scan.
-		 */
-		if (!table_scan_bitmap_next_block(scan,
-										  &node->recheck,
-										  &node->stats.lossy_pages,
-										  &node->stats.exact_pages))
-			break;
+		/* OK to return this tuple */
+		return slot;
 	}
 
 	/*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 71abb01f655..0489ad36644 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -325,7 +325,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 				info->amcanparallel = amroutine->amcanparallel;
 				info->amhasgettuple = (amroutine->amgettuple != NULL);
 				info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
-					relation->rd_tableam->scan_bitmap_next_block != NULL;
+					relation->rd_tableam->scan_bitmap_next_tuple != NULL;
 				info->amcanmarkpos = (amroutine->ammarkpos != NULL &&
 									  amroutine->amrestrpos != NULL);
 				info->amcostestimate = amroutine->amcostestimate;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 507d4ebe68f..b8cb1e744ad 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -780,43 +780,23 @@ typedef struct TableAmRoutine
 	 */
 
 	/*
-	 * Prepare to fetch / check / return tuples from `blockno` as part of a
-	 * bitmap table scan. `scan` was started via table_beginscan_bm(). Return
-	 * false if the bitmap is exhausted and true otherwise.
-	 *
-	 * This will typically read and pin the target block, and do the necessary
-	 * work to allow scan_bitmap_next_tuple() to return tuples (e.g. it might
-	 * make sense to perform tuple visibility checks at this time).
-	 *
-	 * `lossy_pages` and `exact_pages` are EXPLAIN counters that can be
-	 * incremented by the table AM to indicate whether or not the block's
-	 * representation in the bitmap is lossy.
+	 * Fetch the next tuple of a bitmap table scan into `slot` and return true
+	 * if a visible tuple was found, false otherwise.
 	 *
-	 * `recheck` is set by the table AM to indicate whether or not the tuples
-	 * from this block should be rechecked. Tuples from lossy pages will
-	 * always need to be rechecked, but some non-lossy pages' tuples may also
-	 * require recheck.
+	 * `lossy_pages` is incremented if the bitmap is lossy for the selected
+	 * page; otherwise, `exact_pages` is incremented. These are tracked for
+	 * display in EXPLAIN ANALYZE output.
 	 *
 	 * Prefetching additional data from the bitmap is left to the table AM.
 	 *
-	 * Optional callback, but either both scan_bitmap_next_block and
-	 * scan_bitmap_next_tuple need to exist, or neither.
+	 * This is an optional callback.
 	 */
-	bool		(*scan_bitmap_next_block) (TableScanDesc scan,
+	bool		(*scan_bitmap_next_tuple) (TableScanDesc scan,
+										   TupleTableSlot *slot,
 										   bool *recheck,
 										   uint64 *lossy_pages,
 										   uint64 *exact_pages);
 
-	/*
-	 * Fetch the next tuple of a bitmap table scan into `slot` and return true
-	 * if a visible tuple was found, false otherwise.
-	 *
-	 * Optional callback, but either both scan_bitmap_next_block and
-	 * scan_bitmap_next_tuple need to exist, or neither.
-	 */
-	bool		(*scan_bitmap_next_tuple) (TableScanDesc scan,
-										   TupleTableSlot *slot);
-
 	/*
 	 * Prepare to fetch tuples from the next block in a sample scan. Return
 	 * false if the sample scan is finished, true otherwise. `scan` was
@@ -1939,51 +1919,24 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths,
  */
 
 /*
- * Prepare to fetch / check / return tuples as part of a bitmap table scan.
- * `scan` needs to have been started via table_beginscan_bm(). Returns false
- * if there are no more blocks in the bitmap, true otherwise.
- *
- * `lossy_pages` and `exact_pages` are EXPLAIN counters that can be
- * incremented by the table AM to indicate whether or not the block's
- * representation in the bitmap is lossy.
+ * Fetch / check / return tuples as part of a bitmap table scan. `scan` needs
+ * to have been started via table_beginscan_bm(). Fetch the next tuple of a
+ * bitmap table scan into `slot` and return true if a visible tuple was found,
+ * false otherwise.
  *
- * `recheck` is set by the table AM to indicate whether or not the tuples
- * from this block should be rechecked.
+ * `recheck` is set by the table AM to indicate whether or not the tuple in
+ * `slot` should be rechecked. Tuples from lossy pages will always need to be
+ * rechecked, but some non-lossy pages' tuples may also require recheck.
  *
- * Note, this is an optionally implemented function, therefore should only be
- * used after verifying the presence (at plan time or such).
+ * `lossy_pages` is incremented if the block's representation in the bitmap is
+ * lossy; otherwise, `exact_pages` is incremented.
  */
 static inline bool
-table_scan_bitmap_next_block(TableScanDesc scan,
+table_scan_bitmap_next_tuple(TableScanDesc scan,
+							 TupleTableSlot *slot,
 							 bool *recheck,
 							 uint64 *lossy_pages,
 							 uint64 *exact_pages)
-{
-	/*
-	 * We don't expect direct calls to table_scan_bitmap_next_block with valid
-	 * CheckXidAlive for catalog or regular tables.  See detailed comments in
-	 * xact.c where these variables are declared.
-	 */
-	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
-		elog(ERROR, "unexpected table_scan_bitmap_next_block call during logical decoding");
-
-	return scan->rs_rd->rd_tableam->scan_bitmap_next_block(scan,
-														   recheck,
-														   lossy_pages,
-														   exact_pages);
-}
-
-/*
- * Fetch the next tuple of a bitmap table scan into `slot` and return true if
- * a visible tuple was found, false otherwise.
- * table_scan_bitmap_next_block() needs to previously have selected a
- * block (i.e. returned true), and no previous
- * table_scan_bitmap_next_tuple() for the same block may have
- * returned false.
- */
-static inline bool
-table_scan_bitmap_next_tuple(TableScanDesc scan,
-							 TupleTableSlot *slot)
 {
 	/*
 	 * We don't expect direct calls to table_scan_bitmap_next_tuple with valid
@@ -1994,7 +1947,10 @@ table_scan_bitmap_next_tuple(TableScanDesc scan,
 		elog(ERROR, "unexpected table_scan_bitmap_next_tuple call during logical decoding");
 
 	return scan->rs_rd->rd_tableam->scan_bitmap_next_tuple(scan,
-														   slot);
+														   slot,
+														   recheck,
+														   lossy_pages,
+														   exact_pages);
 }
 
 /*
-- 
2.34.1

Reply via email to