(Adding Dilip, the original author of the parallel bitmap heap scan
patch all those years ago, in case you remember anything about the
snapshot stuff below.)
On 27/02/2024 16:22, Melanie Plageman wrote:
On Mon, Feb 26, 2024 at 08:50:28PM -0500, Melanie Plageman wrote:
On Fri, Feb 16, 2024 at 12:35:59PM -0500, Melanie Plageman wrote:
In the attached v3, I've reordered the commits, updated some errant
comments, and improved the commit messages.
I've also made some updates to the TIDBitmap API that seem like a
clarity improvement to the API in general. These also reduce the diff
for GIN when separating the TBMIterateResult from the
TBM[Shared]Iterator. And these TIDBitmap API changes are now all in
their own commits (previously those were in the same commit as adding
the BitmapHeapScan streaming read user).
The three outstanding issues I see in the patch set are:
1) the lossy and exact page counters issue described in my previous
email
I've resolved this. I added a new patch to the set which starts counting
even pages with no visible tuples toward lossy and exact pages. After an
off-list conversation with Andres, it seems that this omission in master
may not have been intentional.
Once we have only two types of pages to differentiate between (lossy and
exact [no longer have to care about "has no visible tuples"]), it is
easy enough to pass a "lossy" boolean paramater to
table_scan_bitmap_next_block(). I've done this in the attached v4.
Thomas posted a new version of the Streaming Read API [1], so here is a
rebased v5. This should make it easier to review as it can be applied on
top of master.
Lots of discussion happening on the performance results but it seems
that there is no performance impact with the preliminary patches up to
v5-0013-Streaming-Read-API.patch. I'm focusing purely on those
preliminary patches now, because I think they're worthwhile cleanups
independent of the streaming read API.
Andres already commented on the snapshot stuff on an earlier patch
version, and that's much nicer with this version. However, I don't
understand why a parallel bitmap heap scan needs to do anything at all
with the snapshot, even before these patches. The parallel worker
infrastructure already passes the active snapshot from the leader to the
parallel worker. Why does bitmap heap scan code need to do that too?
I disabled that with:
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -874,7 +874,9 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
pstate = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id,
false);
node->pstate = pstate;
+#if 0
node->worker_snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
Assert(IsMVCCSnapshot(node->worker_snapshot));
RegisterSnapshot(node->worker_snapshot);
+#endif
}
and ran "make check-world". All the tests passed. To be even more sure,
I added some code there to assert that the serialized version of
node->ss.ps.state->es_snapshot is equal to pstate->phs_snapshot_data,
and all the tests passed with that too.
I propose that we just remove the code in BitmapHeapScan to serialize
the snapshot, per attached patch.
--
Heikki Linnakangas
Neon (https://neon.tech)
From 265d77efa2b56d5cfc3caf7843058b7336524860 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Wed, 13 Mar 2024 15:25:07 +0200
Subject: [PATCH 1/1] Remove redundant snapshot copying from parallel leader to
workers
The parallel query infrastructure copies the leader backend's active
snapshot to the worker processes. But BitmapHeapScan node also had
bespoken code to pass the snapshot from leader to the worker. That was
redundant, so remove it.
Discussion: XX
---
src/backend/access/table/tableam.c | 10 ----------
src/backend/executor/nodeBitmapHeapscan.c | 17 ++---------------
src/include/access/tableam.h | 5 -----
src/include/nodes/execnodes.h | 4 ----
4 files changed, 2 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 6ed8cca05a1..e57a0b7ea31 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -120,16 +120,6 @@ table_beginscan_catalog(Relation relation, int nkeys, struct ScanKeyData *key)
NULL, flags);
}
-void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
- Assert(IsMVCCSnapshot(snapshot));
-
- RegisterSnapshot(snapshot);
- scan->rs_snapshot = snapshot;
- scan->rs_flags |= SO_TEMP_SNAPSHOT;
-}
-
/* ----------------------------------------------------------------------------
* Parallel table scan related functions.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 345b67649ea..ca548e44eb4 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -721,7 +721,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
scanstate->prefetch_iterator = NULL;
scanstate->prefetch_pages = 0;
scanstate->prefetch_target = 0;
- scanstate->pscan_len = 0;
scanstate->initialized = false;
scanstate->shared_tbmiterator = NULL;
scanstate->shared_prefetch_iterator = NULL;
@@ -841,13 +840,7 @@ void
ExecBitmapHeapEstimate(BitmapHeapScanState *node,
ParallelContext *pcxt)
{
- EState *estate = node->ss.ps.state;
-
- node->pscan_len = add_size(offsetof(ParallelBitmapHeapState,
- phs_snapshot_data),
- EstimateSnapshotSpace(estate->es_snapshot));
-
- shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len);
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelBitmapHeapState));
shm_toc_estimate_keys(&pcxt->estimator, 1);
}
@@ -862,14 +855,13 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
ParallelContext *pcxt)
{
ParallelBitmapHeapState *pstate;
- EState *estate = node->ss.ps.state;
dsa_area *dsa = node->ss.ps.state->es_query_dsa;
/* If there's no DSA, there are no workers; initialize nothing. */
if (dsa == NULL)
return;
- pstate = shm_toc_allocate(pcxt->toc, node->pscan_len);
+ pstate = shm_toc_allocate(pcxt->toc, sizeof(ParallelBitmapHeapState));
pstate->tbmiterator = 0;
pstate->prefetch_iterator = 0;
@@ -881,7 +873,6 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
pstate->state = BM_INITIAL;
ConditionVariableInit(&pstate->cv);
- SerializeSnapshot(estate->es_snapshot, pstate->phs_snapshot_data);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pstate);
node->pstate = pstate;
@@ -927,13 +918,9 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelBitmapHeapState *pstate;
- Snapshot snapshot;
Assert(node->ss.ps.state->es_query_dsa != NULL);
pstate = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->pstate = pstate;
-
- snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
- table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
}
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f8474871d2..8249b37bbf1 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -1038,11 +1038,6 @@ table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
allow_pagemode);
}
-/*
- * Update snapshot used by the scan.
- */
-extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
-
/*
* Return next tuple from `scan`, store in slot.
*/
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 444a5f0fd57..27614ab50fb 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1689,7 +1689,6 @@ typedef enum
* prefetch_target current target prefetch distance
* state current state of the TIDBitmap
* cv conditional wait variable
- * phs_snapshot_data snapshot data shared to workers
* ----------------
*/
typedef struct ParallelBitmapHeapState
@@ -1701,7 +1700,6 @@ typedef struct ParallelBitmapHeapState
int prefetch_target;
SharedBitmapState state;
ConditionVariable cv;
- char phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
} ParallelBitmapHeapState;
/* ----------------
@@ -1721,7 +1719,6 @@ typedef struct ParallelBitmapHeapState
* prefetch_pages # pages prefetch iterator is ahead of current
* prefetch_target current target prefetch distance
* prefetch_maximum maximum value for prefetch_target
- * pscan_len size of the shared memory for parallel bitmap
* initialized is node is ready to iterate
* shared_tbmiterator shared iterator
* shared_prefetch_iterator shared iterator for prefetching
@@ -1745,7 +1742,6 @@ typedef struct BitmapHeapScanState
int prefetch_pages;
int prefetch_target;
int prefetch_maximum;
- Size pscan_len;
bool initialized;
TBMSharedIterator *shared_tbmiterator;
TBMSharedIterator *shared_prefetch_iterator;
--
2.39.2