Re: Memory leak from ExecutorState context?

Jehan-Guillaume de Rorthais Wed, 17 May 2023 10:10:51 -0700

On Tue, 16 May 2023 16:00:52 -0400
Melanie Plageman <melanieplage...@gmail.com> wrote:


> On Tue, May 16, 2023 at 04:00:51PM +0200, Jehan-Guillaume de Rorthais wrote:
> 
> > From e5ecd466172b7bae2f1be294c1a5e70ce2b43ed8 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplage...@gmail.com>
> > Date: Thu, 30 Apr 2020 07:16:28 -0700
> > Subject: [PATCH v8 1/3] Describe hash join implementation
> > 
> > This is just a draft to spark conversation on what a good comment might
> > be like in this file on how the hybrid hash join algorithm is
> > implemented in Postgres. I'm pretty sure this is the accepted term for
> > this algorithm https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join  
> 
> I recommend changing the commit message to something like this:
> 
>       Describe hash join implementation
> 
>       Add details to the block comment in nodeHashjoin.c describing the
>       hybrid hash join algorithm at a high level.
> 
>       Author: Melanie Plageman <melanieplage...@gmail.com>
>       Author: Jehan-Guillaume de Rorthais <j...@dalibo.com>
>       Reviewed-by: Tomas Vondra <tomas.von...@enterprisedb.com>
>       Discussion: https://postgr.es/m/20230516160051.4267a800%40karst

Done, but assigning myself as a reviewer as I don't remember having authored
anything in this but the reference to the Hybrid hash page, which is
questionable according to Tomas :)

> > diff --git a/src/backend/executor/nodeHashjoin.c
> > b/src/backend/executor/nodeHashjoin.c index 0a3f32f731..93a78f6f74 100644
> > --- a/src/backend/executor/nodeHashjoin.c
> > ...
> > + * TODO: should we discuss that tuples can only spill forward?  
> 
> I would just cut this for now since we haven't started on an agreed-upon
> wording.

Removed in v9.

> > From 309ad354b7a9e4dfa01b2985bd883829f5e0eba0 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <j...@dalibo.com>
> > Date: Tue, 16 May 2023 15:42:14 +0200
> > Subject: [PATCH v8 2/3] Allocate hash batches related BufFile in a dedicated
> >  context  
> 
> Here is a draft commit message for the second patch:
> 
>     ...

Thanks. Adopted with some minor rewording... hopefully it's OK.

> I recommend editing and adding links to the various discussions on this
> topic from your research.

Done in v9.

> As for the patch itself, I noticed that there are few things needing
> pgindenting.
>
> I usually do the following to run pgindent (in case you haven't done
> this recently).
> 
> ...

Thank you for your recipe.

> There are some existing indentation issues in these files, but you can
> leave those or put them in a separate commit.

Reindented in v9.

I put existing indentation issues in a separate commit to keep the actual
patches clean from distractions.

> ...
> Add a period at the end of this comment.
> 
> > +   /*
> > +    * Use hash join spill memory context to allocate accessors and
> > their
> > +    * buffers
> > +    */

Fixed in v9.

> Add a period at the end of this comment.
> 
> > +   /* Use hash join spill memory context to allocate accessors */

Fixed in v9.

> > diff --git a/src/backend/executor/nodeHashjoin.c
> > b/src/backend/executor/nodeHashjoin.c @@ -1313,21 +1314,30 @@
> > ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
> >   * The data recorded in the file for each tuple is its hash value,
> >   * then the tuple in MinimalTuple format.
> >   *
> > - * Note: it is important always to call this in the regular executor
> > - * context, not in a shorter-lived context; else the temp file buffers
> > - * will get messed up.
> > + * If this is the first write to the batch file, this function first
> > + * create it.  The associated BufFile buffer is allocated in the given
> > + * context.  It is important to always give the HashSpillContext
> > + * context.  First to avoid a shorter-lived context, else the temp file
> > + * buffers will get messed up.  Second, for a better accounting of the
> > + * spilling memory consumption.
> > + *
> >   */  
> 
> Here is my suggested wording fot this block comment:
> 
> The batch file is lazily created. If this is the first tuple written to
> this batch, the batch file is created and its buffer is allocated in the
> given context. It is important to pass in a memory context which will
> live for the entirety of the lifespan of the batch.

Reworded. The context must actually survive the batch itself, not just live
during the lifespan of the batch.

> >  void
> >  ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
> > -                                     BufFile **fileptr)
> > +                                     BufFile **fileptr, MemoryContext
> > cxt) {  

Note that I changed this to:

 ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
                       BufFile **fileptr, HashJoinTable hashtable) {

As this function must allocate BufFile buffer in spillCxt, I suppose
we should force it explicitly in the function code.

Plus, having hashtable locally could be useful for later patch to eg. fine track
allocated memory in spaceUsed.

> > diff --git a/src/include/executor/hashjoin.h
> > b/src/include/executor/hashjoin.h index 8ee59d2c71..ac27222d18 100644
> > --- a/src/include/executor/hashjoin.h
> > +++ b/src/include/executor/hashjoin.h
> > @@ -23,12 +23,12 @@
> >  /* ----------------------------------------------------------------
> >   *                         hash-join hash table structures
> >   *
> > - * Each active hashjoin has a HashJoinTable control block, which is
> > - * palloc'd in the executor's per-query context.  All other storage needed
> > - * for the hashjoin is kept in private memory contexts, two for each
> > hashjoin.
> > - * This makes it easy and fast to release the storage when we don't need it
> > - * anymore.  (Exception: data associated with the temp files lives in the
> > - * per-query context too, since we always call buffile.c in that context.)
> > + * Each active hashjoin has a HashJoinTable structure, which is  
> 
> "Other storages" should be "Other storage"
> 
> > + * palloc'd in the executor's per-query context.  Other storages needed for

Fixed in v9.

> ... 
> 
> "mainly hash's meta datas" -> "mainly the hashtable's metadata"
> 
> > + * "hashCxt" (mainly hash's meta datas). Also, the "hashCxt" context is the

Fixed in v9.

> Suggested alternative wording for the below:
> 
> * Data associated with temp files is allocated in the "spillCxt" context
> * which lives for the duration of the entire join as batch files'
> * creation and usage may span batch execution. These files are
> * explicitly destroyed by calling BufFileClose() when the code is done
> * with them. The aim of this context is to help accounting for the
> * memory allocated for temp files and their buffers.

Adopted in v9.

> Suggested alternative wording for the below:
> 
> * Finally, data used only during a single batch's execution is allocated
> * in the "batchCxt". By resetting the batchCxt at the end of each batch,
> * we free all the per-batch storage reliably and without tedium.

Adopted in v9.

Thank you for your review!

Regards,

>From dd7aa7e4c7abaa9a50c7abf9f7420bebb790718b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 30 Apr 2020 07:16:28 -0700
Subject: [PATCH v9 1/3] Describe hash join implementation

Add details to the block comment in nodeHashjoin.c describing the
hybrid hash join algorithm at a high level.

Author: Melanie Plageman <melanieplage...@gmail.com>
Reviewed-by: Tomas Vondra <tomas.von...@enterprisedb.com>
Reviewed-by: Jehan-Guillaume de Rorthais <j...@dalibo.com>
Discussion: https://postgr.es/m/20230516160051.4267a800%40karst
---
 src/backend/executor/nodeHashjoin.c | 39 +++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 0a3f32f731..78e202b4f9 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -10,6 +10,45 @@
  * IDENTIFICATION
  *	  src/backend/executor/nodeHashjoin.c
  *
+ * HASH JOIN
+ *
+ * This is based on the "hybrid hash join" algorithm described shortly in the
+ * following page, and in length in the pdf in page's notes:
+ *
+ *   https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
+ *
+ * If the inner side tuples of a hash join do not fit in memory, the hash join
+ * can be executed in multiple batches.
+ *
+ * If the statistics on the inner side relation are accurate, planner chooses a
+ * multi-batch strategy and estimates the number of batches.
+ *
+ * The query executor measures the real size of the hashtable and increases the
+ * number of batches if the hashtable grows too large.
+ *
+ * The number of batches is always a power of two, so an increase in the number
+ * of batches doubles it.
+ *
+ * Serial hash join measures batch size lazily -- waiting until it is loading a
+ * batch to determine if it will fit in memory. While inserting tuples into the
+ * hashtable, serial hash join will, if that tuple were to exceed work_mem,
+ * dump out the hashtable and reassign them either to other batch files or the
+ * current batch resident in the hashtable.
+ *
+ * Parallel hash join, on the other hand, completes all changes to the number
+ * of batches during the build phase. If it increases the number of batches, it
+ * dumps out all the tuples from all batches and reassigns them to entirely new
+ * batch files. Then it checks every batch to ensure it will fit in the space
+ * budget for the query.
+ *
+ * In both parallel and serial hash join, the executor currently makes a best
+ * effort. If a particular batch will not fit in memory, it tries doubling the
+ * number of batches. If after a batch increase, there is a batch which
+ * retained all or none of its tuples, the executor disables growth in the
+ * number of batches globally. After growth is disabled, all batches that would
+ * have previously triggered an increase in the number of batches instead
+ * exceed the space allowed.
+ *
  * PARALLELISM
  *
  * Hash joins can participate in parallel query execution in several ways.  A
-- 
2.40.1

>From c7b70dec3f4c162ea590b53a407c39dfd7ade873 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais <j...@dalibo.com>
Date: Tue, 16 May 2023 15:42:14 +0200
Subject: [PATCH v9 2/3] Dedicated memory context for hash join spill buffers

Should a hash join exceed work_mem, its hashtable is split up into
multiple batches. The number of batches is doubled each time a given
batch is determined not to fit in memory. Each batch file is
allocated with a block-sized buffer for buffering tuples and
parallel hash join has additional sharedtuplestore accessor buffers.

In some pathological cases requiring a lot of batches, often with
skewed data, bad stats, or very large datasets, users can run
out-of-memory solely from the memory overhead of all the batch
files' buffers.

Batch files were allocated in the ExecutorState memory context, making
it very hard to identify when this batch explosion was the source of an
OOM. By allocating the batch files in a dedicated memory context, it
should be easier for users to identify the cause of an OOM and work to
avoid it.

Original draft by Tomas Vondra.

Author: Tomas Vondra <tomas.von...@enterprisedb.com>
Author: Jehan-Guillaume de Rorthais <j...@dalibo.com>
Reviewed-by:  Melanie Plageman <melanieplage...@gmail.com>
Discussion: https://postgr.es/m/20190421114618.z3mpgmimc3rmubi4@development
Discussion: https://postgr.es/m/20230504193006.1b5b9622%40karst#273020ff4061fc7a2fbb1ba96b281f17
---
 src/backend/executor/nodeHash.c           | 43 ++++++++++++++++-------
 src/backend/executor/nodeHashjoin.c       | 31 ++++++++++++----
 src/backend/utils/sort/sharedtuplestore.c |  8 +++++
 src/include/executor/hashjoin.h           | 30 +++++++++++-----
 src/include/executor/nodeHashjoin.h       |  2 +-
 5 files changed, 84 insertions(+), 30 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 5fd1c5553b..444d182bca 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -484,7 +484,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 	 *
 	 * The hashtable control block is just palloc'd from the executor's
 	 * per-query memory context.  Everything else should be kept inside the
-	 * subsidiary hashCxt or batchCxt.
+	 * subsidiary hashCxt, batchCxt or spillCxt.
 	 */
 	hashtable = palloc_object(HashJoinTableData);
 	hashtable->nbuckets = nbuckets;
@@ -538,6 +538,10 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 												"HashBatchContext",
 												ALLOCSET_DEFAULT_SIZES);
 
+	hashtable->spillCxt = AllocSetContextCreate(hashtable->hashCxt,
+												"HashSpillContext",
+												ALLOCSET_DEFAULT_SIZES);
+
 	/* Allocate data that will live for the life of the hashjoin */
 
 	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
@@ -570,12 +574,19 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 
 	if (nbatch > 1 && hashtable->parallel_state == NULL)
 	{
+		MemoryContext oldctx;
+
 		/*
 		 * allocate and initialize the file arrays in hashCxt (not needed for
 		 * parallel case which uses shared tuplestores instead of raw files)
 		 */
+		oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
+
 		hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch);
 		hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
+
+		MemoryContextSwitchTo(oldctx);
+
 		/* The files will not be opened until needed... */
 		/* ... but make sure we have temp tablespaces established for them */
 		PrepareTempTablespaces();
@@ -913,7 +924,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	int			oldnbatch = hashtable->nbatch;
 	int			curbatch = hashtable->curbatch;
 	int			nbatch;
-	MemoryContext oldcxt;
 	long		ninmemory;
 	long		nfreed;
 	HashMemoryChunk oldchunks;
@@ -934,13 +944,16 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 		   hashtable, nbatch, hashtable->spaceUsed);
 #endif
 
-	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
-
 	if (hashtable->innerBatchFile == NULL)
 	{
+		MemoryContext oldcxt = MemoryContextSwitchTo(hashtable->spillCxt);
+
 		/* we had no file arrays before */
 		hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch);
 		hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
+
+		MemoryContextSwitchTo(oldcxt);
+
 		/* time to establish the temp tablespaces, too */
 		PrepareTempTablespaces();
 	}
@@ -951,8 +964,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 		hashtable->outerBatchFile = repalloc0_array(hashtable->outerBatchFile, BufFile *, oldnbatch, nbatch);
 	}
 
-	MemoryContextSwitchTo(oldcxt);
-
 	hashtable->nbatch = nbatch;
 
 	/*
@@ -1024,7 +1035,8 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 				Assert(batchno > curbatch);
 				ExecHashJoinSaveTuple(HJTUPLE_MINTUPLE(hashTuple),
 									  hashTuple->hashvalue,
-									  &hashtable->innerBatchFile[batchno]);
+									  &hashtable->innerBatchFile[batchno],
+									  hashtable);
 
 				hashtable->spaceUsed -= hashTupleSize;
 				nfreed++;
@@ -1683,7 +1695,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
 		Assert(batchno > hashtable->curbatch);
 		ExecHashJoinSaveTuple(tuple,
 							  hashvalue,
-							  &hashtable->innerBatchFile[batchno]);
+							  &hashtable->innerBatchFile[batchno],
+							  hashtable);
 	}
 
 	if (shouldFree)
@@ -2664,7 +2677,8 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 			/* Put the tuple into a temp file for later batches */
 			Assert(batchno > hashtable->curbatch);
 			ExecHashJoinSaveTuple(tuple, hashvalue,
-								  &hashtable->innerBatchFile[batchno]);
+								  &hashtable->innerBatchFile[batchno],
+								  hashtable);
 			pfree(hashTuple);
 			hashtable->spaceUsed -= tupleSize;
 			hashtable->spaceUsedSkew -= tupleSize;
@@ -3093,8 +3107,11 @@ ExecParallelHashJoinSetUpBatches(HashJoinTable hashtable, int nbatch)
 	pstate->nbatch = nbatch;
 	batches = dsa_get_address(hashtable->area, pstate->batches);
 
-	/* Use hash join memory context. */
-	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+	/*
+	 * Use hash join spill memory context to allocate accessors and their
+	 * buffers.
+	 */
+	oldcxt = MemoryContextSwitchTo(hashtable->spillCxt);
 
 	/* Allocate this backend's accessor array. */
 	hashtable->nbatch = nbatch;
@@ -3196,8 +3213,8 @@ ExecParallelHashEnsureBatchAccessors(HashJoinTable hashtable)
 	 */
 	Assert(DsaPointerIsValid(pstate->batches));
 
-	/* Use hash join memory context. */
-	oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+	/* Use hash join spill memory context to allocate accessors. */
+	oldcxt = MemoryContextSwitchTo(hashtable->spillCxt);
 
 	/* Allocate this backend's accessor array. */
 	hashtable->nbatch = pstate->nbatch;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 78e202b4f9..1092a33525 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -489,7 +489,8 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
 					Assert(parallel_state == NULL);
 					Assert(batchno > hashtable->curbatch);
 					ExecHashJoinSaveTuple(mintuple, hashvalue,
-										  &hashtable->outerBatchFile[batchno]);
+										  &hashtable->outerBatchFile[batchno],
+										  hashtable);
 
 					if (shouldFree)
 						heap_free_minimal_tuple(mintuple);
@@ -1310,22 +1311,38 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
  *
  * The data recorded in the file for each tuple is its hash value,
  * then the tuple in MinimalTuple format.
- *
- * Note: it is important always to call this in the regular executor
- * context, not in a shorter-lived context; else the temp file buffers
- * will get messed up.
  */
 void
 ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-					  BufFile **fileptr)
+					  BufFile **fileptr, HashJoinTable hashtable)
 {
 	BufFile    *file = *fileptr;
 
 	if (file == NULL)
 	{
-		/* First write to this batch file, so open it. */
+		MemoryContext oldctx;
+
+		/*
+		 * The batch file is lazily created. If this is the first tuple
+		 * written to this batch, the batch file is created and its buffer is
+		 * allocated in the spillCxt context, NOT in the batchCxt.
+		 *
+		 * During the building phase, inner batch are created with their temp
+		 * file buffers. These buffers are released later, after the batch is
+		 * loaded back to memory during the outer side scan. That explains why
+		 * it is important to use a memory context which live longer than the
+		 * batch itself or some temp file buffers will get messed up.
+		 *
+		 * Also, we use spillCxt instead of hashCxt for a better accounting of
+		 * the spilling memory consumption.
+		 */
+
+		oldctx = MemoryContextSwitchTo(hashtable->spillCxt);
+
 		file = BufFileCreateTemp(false);
 		*fileptr = file;
+
+		MemoryContextSwitchTo(oldctx);
 	}
 
 	BufFileWrite(file, &hashvalue, sizeof(uint32));
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 0831249159..236be65f22 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -308,11 +308,15 @@ sts_puttuple(SharedTuplestoreAccessor *accessor, void *meta_data,
 	{
 		SharedTuplestoreParticipant *participant;
 		char		name[MAXPGPATH];
+		MemoryContext oldcxt;
 
 		/* Create one.  Only this backend will write into it. */
 		sts_filename(name, accessor, accessor->participant);
+
+		oldcxt = MemoryContextSwitchTo(accessor->context);
 		accessor->write_file =
 			BufFileCreateFileSet(&accessor->fileset->fs, name);
+		MemoryContextSwitchTo(oldcxt);
 
 		/* Set up the shared state for this backend's file. */
 		participant = &accessor->sts->participants[accessor->participant];
@@ -527,11 +531,15 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 			if (accessor->read_file == NULL)
 			{
 				char		name[MAXPGPATH];
+				MemoryContext oldcxt;
 
 				sts_filename(name, accessor, accessor->read_participant);
+
+				oldcxt = MemoryContextSwitchTo(accessor->context);
 				accessor->read_file =
 					BufFileOpenFileSet(&accessor->fileset->fs, name, O_RDONLY,
 									   false);
+				MemoryContextSwitchTo(oldcxt);
 			}
 
 			/* Seek and load the chunk header. */
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 8ee59d2c71..857ca58f6f 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -23,12 +23,12 @@
 /* ----------------------------------------------------------------
  *				hash-join hash table structures
  *
- * Each active hashjoin has a HashJoinTable control block, which is
- * palloc'd in the executor's per-query context.  All other storage needed
- * for the hashjoin is kept in private memory contexts, two for each hashjoin.
- * This makes it easy and fast to release the storage when we don't need it
- * anymore.  (Exception: data associated with the temp files lives in the
- * per-query context too, since we always call buffile.c in that context.)
+ * Each active hashjoin has a HashJoinTable structure, which is
+ * palloc'd in the executor's per-query context.  Other storage needed for
+ * each hashjoin is kept in child contexts, three for each hashjoin:
+ *   - HashTableContext (hashCxt): the parent hash table storage context
+ *   - HashSpillContext (spillCxt): storage for temp files buffers
+ *   - HashBatchContext (batchCxt): storage for a batch in serial hash join
  *
  * The hashtable contexts are made children of the per-query context, ensuring
  * that they will be discarded at end of statement even if the join is
@@ -36,9 +36,20 @@
  * be cleaned up by the virtual file manager in event of an error.)
  *
  * Storage that should live through the entire join is allocated from the
- * "hashCxt", while storage that is only wanted for the current batch is
- * allocated in the "batchCxt".  By resetting the batchCxt at the end of
- * each batch, we free all the per-batch storage reliably and without tedium.
+ * "hashCxt" (mainly the hashtable's metadata). Also, the "hashCxt" context is
+ * the parent of "spillCxt" and "batchCxt". It makes it easy and fast to
+ * release the storage when we don't need it anymore.
+ *
+ * Data associated with temp files is allocated in the "spillCxt" context
+ * which lives for the duration of the entire join as batch files'
+ * creation and usage may span batch execution. These files are
+ * explicitly destroyed by calling BufFileClose() when the code is done
+ * with them. The aim of this context is to help accounting for the
+ * memory allocated for temp files and their buffers.
+ *
+ * Finally, data used only during a single batch's execution is allocated
+ * in the "batchCxt". By resetting the batchCxt at the end of each batch,
+ * we free all the per-batch storage reliably and without tedium.
  *
  * During first scan of inner relation, we get its tuples from executor.
  * If nbatch > 1 then tuples that don't belong in first batch get saved
@@ -350,6 +361,7 @@ typedef struct HashJoinTableData
 
 	MemoryContext hashCxt;		/* context for whole-hash-join storage */
 	MemoryContext batchCxt;		/* context for this-batch-only storage */
+	MemoryContext spillCxt;		/* context for spilling to temp files */
 
 	/* used for dense allocation of tuples (into linked chunks) */
 	HashMemoryChunk chunks;		/* one list for the whole batch */
diff --git a/src/include/executor/nodeHashjoin.h b/src/include/executor/nodeHashjoin.h
index d367070883..ccb704ede1 100644
--- a/src/include/executor/nodeHashjoin.h
+++ b/src/include/executor/nodeHashjoin.h
@@ -29,6 +29,6 @@ extern void ExecHashJoinInitializeWorker(HashJoinState *state,
 										 ParallelWorkerContext *pwcxt);
 
 extern void ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
-								  BufFile **fileptr);
+								  BufFile **fileptr, HashJoinTable hashtable);
 
 #endif							/* NODEHASHJOIN_H */
-- 
2.40.1

>From 7635c88f654091fedd4b98fb7c19325fc6a0212d Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais <j...@dalibo.com>
Date: Wed, 17 May 2023 19:01:06 +0200
Subject: [PATCH v9 3/3] Run pgindent on nodeHash.c and nodeHashjoin.c

---
 src/backend/executor/nodeHash.c     | 6 +++---
 src/backend/executor/nodeHashjoin.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 444d182bca..7571db4c1d 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -1339,7 +1339,7 @@ ExecParallelHashRepartitionFirst(HashJoinTable hashtable)
 			else
 			{
 				size_t		tuple_size =
-				MAXALIGN(HJTUPLE_OVERHEAD + tuple->t_len);
+					MAXALIGN(HJTUPLE_OVERHEAD + tuple->t_len);
 
 				/* It belongs in a later batch. */
 				hashtable->batches[batchno].estimated_size += tuple_size;
@@ -1381,7 +1381,7 @@ ExecParallelHashRepartitionRest(HashJoinTable hashtable)
 	for (i = 1; i < old_nbatch; ++i)
 	{
 		ParallelHashJoinBatch *shared =
-		NthParallelHashJoinBatch(old_batches, i);
+			NthParallelHashJoinBatch(old_batches, i);
 
 		old_inner_tuples[i] = sts_attach(ParallelHashJoinBatchInner(shared),
 										 ParallelWorkerNumber + 1,
@@ -3334,7 +3334,7 @@ ExecHashTableDetachBatch(HashJoinTable hashtable)
 			while (DsaPointerIsValid(batch->chunks))
 			{
 				HashMemoryChunk chunk =
-				dsa_get_address(hashtable->area, batch->chunks);
+					dsa_get_address(hashtable->area, batch->chunks);
 				dsa_pointer next = chunk->next.shared;
 
 				dsa_free(hashtable->area, batch->chunks);
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 1092a33525..16562ccaf8 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1210,7 +1210,7 @@ ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
 		{
 			SharedTuplestoreAccessor *inner_tuples;
 			Barrier    *batch_barrier =
-			&hashtable->batches[batchno].shared->batch_barrier;
+				&hashtable->batches[batchno].shared->batch_barrier;
 
 			switch (BarrierAttach(batch_barrier))
 			{
@@ -1614,7 +1614,7 @@ ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
 {
 	int			plan_node_id = state->js.ps.plan->plan_node_id;
 	ParallelHashJoinState *pstate =
-	shm_toc_lookup(pcxt->toc, plan_node_id, false);
+		shm_toc_lookup(pcxt->toc, plan_node_id, false);
 
 	/*
 	 * It would be possible to reuse the shared hash table in single-batch
@@ -1649,7 +1649,7 @@ ExecHashJoinInitializeWorker(HashJoinState *state,
 	HashState  *hashNode;
 	int			plan_node_id = state->js.ps.plan->plan_node_id;
 	ParallelHashJoinState *pstate =
-	shm_toc_lookup(pwcxt->toc, plan_node_id, false);
+		shm_toc_lookup(pwcxt->toc, plan_node_id, false);
 
 	/* Attach to the space for shared temporary files. */
 	SharedFileSetAttach(&pstate->fileset, pwcxt->seg);
-- 
2.40.1

Re: Memory leak from ExecutorState context?

Reply via email to