Re: Parallel Full Hash Join

Melanie Plageman Thu, 11 Feb 2021 14:02:37 -0800

On Tue, Dec 29, 2020 at 03:28:12PM +1300, Thomas Munro wrote:
> I had some feedback I meant to
> post in November but didn't get around to:
> 
>   *  PHJ_BATCH_PROBING        -- all probe
> - *  PHJ_BATCH_DONE           -- end
> +
> + *  PHJ_BATCH_DONE           -- queries not requiring inner fill done
> + *  PHJ_BATCH_FILL_INNER_DONE -- inner fill completed, all queries done
> 
> Would it be better/tidier to keep _DONE as the final phase?  That is,
> to switch around these two final phases.  Or does that make it too
> hard to coordinate the detach-and-cleanup logic?


I updated this to use your suggestion. My rationale for having
PHJ_BATCH_DONE and then PHJ_BATCH_FILL_INNER_DONE was that, for a worker
attaching to the batch for the first time, it might be confusing that it
is in the PHJ_BATCH_FILL_INNER state (not the DONE state) and yet that
worker still just detaches and moves on. It didn't seem intuitive.
Anyway, I think that is all sort of confusing and unnecessary. I changed
it to PHJ_BATCH_FILLING_INNER -- then when a worker who hasn't ever been
attached to this batch before attaches, it will be in the
PHJ_BATCH_FILLING_INNER phase, which it cannot help with and it will
detach and move on.

> 
> +/*
> + * ExecPrepHashTableForUnmatched
> + *             set up for a series of ExecScanHashTableForUnmatched calls
> + *             return true if this worker is elected to do the
> unmatched inner scan
> + */
> +bool
> +ExecParallelPrepHashTableForUnmatched(HashJoinState *hjstate)
> 
> Comment name doesn't match function name.

Updated -- and a few other comment updates too.

I just attached the diff.

>From 213c36f9e125f52eb6731005d5dcdadccccca73a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 11 Feb 2021 16:31:37 -0500
Subject: [PATCH v4 2/2] Update comments and phase naming

---
 src/backend/executor/nodeHash.c     | 19 +++++++++++++------
 src/backend/executor/nodeHashjoin.c |  4 ++--
 src/include/executor/hashjoin.h     |  4 ++--
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index dd8d12203a..6305688efd 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -2047,11 +2047,10 @@ void
 ExecPrepHashTableForUnmatched(HashJoinState *hjstate)
 {
 	/*----------
-	 * During this scan we use the HashJoinState fields as follows:
+	 * During this scan we use the HashJoinTable fields as follows:
 	 *
-	 * hj_CurBucketNo: next regular bucket to scan
-	 * hj_CurSkewBucketNo: next skew bucket (an index into skewBucketNums)
-	 * hj_CurTuple: last tuple returned, or NULL to start next bucket
+	 * current_chunk: current HashMemoryChunk to scan
+	 * current_chunk_idx: index in current HashMemoryChunk
 	 *----------
 	 */
 	HashJoinTable hashtable = hjstate->hj_HashTable;
@@ -2064,13 +2063,21 @@ ExecPrepHashTableForUnmatched(HashJoinState *hjstate)
 }
 
 /*
- * ExecPrepHashTableForUnmatched
- *		set up for a series of ExecScanHashTableForUnmatched calls
+ * ExecParallelPrepHashTableForUnmatched
+ *		set up for a series of ExecParallelScanHashTableForUnmatched calls
  *		return true if this worker is elected to do the unmatched inner scan
  */
 bool
 ExecParallelPrepHashTableForUnmatched(HashJoinState *hjstate)
 {
+	/*----------
+	 * During this scan we use the ParallelHashJoinBatchAccessor fields for the
+	 * current batch as follows:
+	 *
+	 * current_chunk: current HashMemoryChunk to scan
+	 * current_chunk_idx: index in current HashMemoryChunk
+	 *----------
+	 */
 	HashJoinTable hashtable = hjstate->hj_HashTable;
 	int			curbatch = hashtable->curbatch;
 	ParallelHashJoinBatchAccessor *batch_accessor = &hashtable->batches[curbatch];
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 37b49369aa..40c483cd0c 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1183,10 +1183,10 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
 					ExecParallelHashTableSetCurrentBatch(hashtable, batchno);
 					sts_begin_parallel_scan(hashtable->batches[batchno].outer_tuples);
 					return true;
-				case PHJ_BATCH_DONE:
+				case PHJ_BATCH_FILLING_INNER:
 					/* Fall through. */
 
-				case PHJ_BATCH_FILL_INNER_DONE:
+				case PHJ_BATCH_DONE:
 
 					/*
 					 * Already done.  Detach and go around again (if any
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 634a212142..66fea4ac58 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -274,8 +274,8 @@ typedef struct ParallelHashJoinState
 #define PHJ_BATCH_ALLOCATING			1
 #define PHJ_BATCH_LOADING				2
 #define PHJ_BATCH_PROBING				3
-#define PHJ_BATCH_DONE			        4
-#define PHJ_BATCH_FILL_INNER_DONE		5
+#define PHJ_BATCH_FILLING_INNER			4
+#define PHJ_BATCH_DONE		            5
 
 /* The phases of batch growth while hashing, for grow_batches_barrier. */
 #define PHJ_GROW_BATCHES_ELECTING		0
-- 
2.25.1

Re: Parallel Full Hash Join

Reply via email to