Hi All,

In try_partitionwise_join() we create SpecialJoinInfo structures for
child joins by translating SpecialJoinInfo structures applicable to
the parent join. These SpecialJoinInfos are not used outside
try_partitionwise_join() but we do not free memory allocated to those.
In try_partitionwise_join() we create as many SpecialJoinInfos as the
number of partitions. But try_partitionwise_join() itself is called as
many times as the number of join orders for a given join. Thus the
number of child SpecialJoinInfos that are created increases
exponentially with the number of partitioned tables being joined.

The attached patch (0002) fixes this issue by
1. declaring child SpecialJoinInfo as a local variable, thus
allocating memory on the stack instead of CurrentMemoryContext. The
memory on the stack is freed automatically.
2. Freeing the Relids in SpecialJoinInfo explicitly after
SpecialJoinInfo has been used.

We can not free the object trees in SpecialJoinInfo since those may be
referenced in the paths.

Similar to my previous emails [1], the memory consumption for given
queries is measured using attached patch (0001). The table definitions
and helper functions can be found in setup.sql and queries.sql.
Following table shows the reduction in the memory using the attached
patch (0002).

Number of      | without     |             |         | Absolute    |
joining tables | patch       | with patch  | % diff  | diff        |
--------------------------------------------------------------------
             2 |    40.9 MiB |    39.9 MiB |   2.27% |   925.7 KiB |
             3 |   151.7 MiB |   142.6 MiB |   5.97% |     9.0 MiB |
             4 |   464.0 MiB |   418.4 MiB |   9.83% |    45.6 MiB |
             5 |  1663.9 MiB |  1419.4 MiB |  14.69% |   244.5 MiB |

Since the memory consumed by SpecialJoinInfos is exponentially
proportional to the number of tables being joined, we see that the
patch is able to save more memory at higher number of tables joined.

The attached patch (0002) frees members of individual
SpecialJoinInfos. It might have some impact on the planning time. We
may improve that by allocating the members in a separate memory
context. The memory context will be allocated and deleted in each
invocation of try_partitionwise_join(). I am assuming that deleting a
memory context is more efficient than freeing bits of memory multiple
times. But I have not tried that approach.

Another approach would be to track the SpecialJoinInfo translations
(similar to RestrictListInfo translations proposed in other email
thread [2]) and avoid repeatedly translating the same SpecialJoinInfo.
The approach will have similar disadvantages that it may increase size
of SpecialJoinInfo structure even when planning the queries which do
not need partitionwise join. So I haven't tried that approach yet.

At this stage, I am looking for opinions on
1. whether it's worth reducing this memory
2. any suggestions/comments on which approach from above is more
suitable or if there are other approaches

References
[1] 
https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAExHW5s=bclmmq8n_bn6iu+pjau0ds3z_6dn6ile69esmsp...@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment: queries.sql
Description: application/sql

Attachment: setup.sql
Description: application/sql

From a5d8492517121ddc0366a4a853e041e3d4ab663d Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Mon, 24 Jul 2023 20:20:53 +0530
Subject: [PATCH 2/2] Reduce memory used by child SpecialJoinInfo

The SpecialJoinInfo applicable to a child join is computed by
translating the same applicable to the parent join in
try_partitionwise_join(). The child SpecialJoinInfo is not needed once
the child join RelOptInfo is created and paths are added to it. Use a
local variable to hold child SpecialJoinInfo so that it doesn't need
memory allocated separately.

Also free the memory allocated to various Bitmapsets in SpecialJoinInfo.
Those are not referenced anywhere. But we do not free the memory
allocated to expression trees of operator OID list since those may be
referenced in paths or other objects.
---
 src/backend/optimizer/path/joinrels.c | 66 +++++++++++++++++++--------
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 015a0b3cbe..51a811c6d9 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -42,9 +42,11 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 								   RelOptInfo *rel2, RelOptInfo *joinrel,
 								   SpecialJoinInfo *parent_sjinfo,
 								   List *parent_restrictlist);
-static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
-												SpecialJoinInfo *parent_sjinfo,
-												Relids left_relids, Relids right_relids);
+static void build_child_join_sjinfo(PlannerInfo *root,
+									SpecialJoinInfo *parent_sjinfo,
+									Relids left_relids, Relids right_relids,
+									SpecialJoinInfo *child_sjinfo);
+static void free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo);
 static void compute_partition_bounds(PlannerInfo *root, RelOptInfo *rel1,
 									 RelOptInfo *rel2, RelOptInfo *joinrel,
 									 SpecialJoinInfo *parent_sjinfo,
@@ -1540,7 +1542,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptInfo *child_rel2;
 		bool		rel1_empty;
 		bool		rel2_empty;
-		SpecialJoinInfo *child_sjinfo;
+		SpecialJoinInfo child_sjinfo;
 		List	   *child_restrictlist;
 		RelOptInfo *child_joinrel;
 		AppendRelInfo **appinfos;
@@ -1635,9 +1637,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		 * Construct SpecialJoinInfo from parent join relations's
 		 * SpecialJoinInfo.
 		 */
-		child_sjinfo = build_child_join_sjinfo(root, parent_sjinfo,
-											   child_rel1->relids,
-											   child_rel2->relids);
+		build_child_join_sjinfo(root, parent_sjinfo, child_rel1->relids,
+								child_rel2->relids, &child_sjinfo);
 
 		/* Find the AppendRelInfo structures */
 		appinfos = find_appinfos_by_relids(root,
@@ -1660,7 +1661,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		{
 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
 												 joinrel, child_restrictlist,
-												 child_sjinfo);
+												 &child_sjinfo);
 			joinrel->part_rels[cnt_parts] = child_joinrel;
 			joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts);
 			joinrel->all_partrels = bms_add_members(joinrel->all_partrels,
@@ -1674,10 +1675,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 
 		/* And make paths for the child join */
 		populate_joinrel_with_paths(root, child_rel1, child_rel2,
-									child_joinrel, child_sjinfo,
+									child_joinrel, &child_sjinfo,
 									child_restrictlist);
 
 		pfree(appinfos);
+		free_child_sjinfo_members(&child_sjinfo);
 	}
 }
 
@@ -1686,42 +1688,66 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
  * SpecialJoinInfo for the join between parents. left_relids and right_relids
  * are the relids of left and right side of the join respectively.
  */
-static SpecialJoinInfo *
+static void 
 build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
-						Relids left_relids, Relids right_relids)
+						Relids left_relids, Relids right_relids,
+						SpecialJoinInfo *child_sjinfo)
 {
-	SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo);
 	AppendRelInfo **left_appinfos;
 	int			left_nappinfos;
 	AppendRelInfo **right_appinfos;
 	int			right_nappinfos;
 
-	memcpy(sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo));
+	memcpy(child_sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo));
 	left_appinfos = find_appinfos_by_relids(root, left_relids,
 											&left_nappinfos);
 	right_appinfos = find_appinfos_by_relids(root, right_relids,
 											 &right_nappinfos);
 
-	sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand,
+	child_sjinfo->min_lefthand = adjust_child_relids(child_sjinfo->min_lefthand,
 											   left_nappinfos, left_appinfos);
-	sjinfo->min_righthand = adjust_child_relids(sjinfo->min_righthand,
+	child_sjinfo->min_righthand = adjust_child_relids(child_sjinfo->min_righthand,
 												right_nappinfos,
 												right_appinfos);
-	sjinfo->syn_lefthand = adjust_child_relids(sjinfo->syn_lefthand,
+	child_sjinfo->syn_lefthand = adjust_child_relids(child_sjinfo->syn_lefthand,
 											   left_nappinfos, left_appinfos);
-	sjinfo->syn_righthand = adjust_child_relids(sjinfo->syn_righthand,
+	child_sjinfo->syn_righthand = adjust_child_relids(child_sjinfo->syn_righthand,
 												right_nappinfos,
 												right_appinfos);
 	/* outer-join relids need no adjustment */
-	sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root,
-															 (Node *) sjinfo->semi_rhs_exprs,
+	child_sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root,
+															 (Node *) child_sjinfo->semi_rhs_exprs,
 															 right_nappinfos,
 															 right_appinfos);
 
 	pfree(left_appinfos);
 	pfree(right_appinfos);
+}
+
+/*
+ * free_child_sjinfo_members
+ *		Free memory consumed by members of a child SpecialJoinInfo. 
+ */
+static void
+free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
+{
+	/*
+	 * The relids are used only for comparison and their references are not
+	 * stored anywhere. Free those.
+	 */
+	bms_free(child_sjinfo->min_lefthand);
+	bms_free(child_sjinfo->min_righthand);
+	bms_free(child_sjinfo->syn_lefthand);
+	bms_free(child_sjinfo->syn_righthand);
+	bms_free(child_sjinfo->commute_above_l);
+	bms_free(child_sjinfo->commute_above_r);
+	bms_free(child_sjinfo->commute_below_l);
+	bms_free(child_sjinfo->commute_below_r);
 
-	return sjinfo;
+	/*
+	 * But the list of operator OIDs and the list of expressions may be
+	 * referenced somewhere else. Do not free those.
+	 */
 }
 
 /*
-- 
2.25.1

From 3b9367f743923263b56054a79f7f36cd710904e1 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH 1/2] Report memory used for planning a query in EXPLAIN
 ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c | 12 ++++++++++--
 src/backend/commands/prepare.c |  7 ++++++-
 src/backend/utils/mmgr/mcxt.c  | 19 +++++++++++++++++++
 src/include/commands/explain.h |  3 ++-
 src/include/utils/memutils.h   |  1 +
 5 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9f859949f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions,
 					planduration;
 		BufferUsage bufusage_start,
 					bufusage;
+		Size		mem_consumed;
 
 		if (es->buffers)
 			bufusage_start = pgBufferUsage;
 		INSTR_TIME_SET_CURRENT(planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 		/* plan the query */
 		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+						- mem_consumed;
 
 		/* calc differences of buffer counters. */
 		if (es->buffers)
@@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 
 		/* run it (if needed) and produce output */
 		ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
-					   &planduration, (es->buffers ? &bufusage : NULL));
+					   &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed);
 	}
 }
 
@@ -527,7 +531,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage)
+			   const BufferUsage *bufusage, const Size *mem_consumed)
 {
 	DestReceiver *dest;
 	QueryDesc  *queryDesc;
@@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
 		ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es);
+
+		if (mem_consumed)
+			ExplainPropertyUInteger("Planning Memory", "bytes",
+										(uint64) *mem_consumed, es);
 	}
 
 	/* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..84544ce481 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	Size		mem_consumed;
 
 	if (es->buffers)
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 	/* Look it up in the hash table */
 	entry = FetchPreparedStatement(execstmt->name, true);
@@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+					- mem_consumed;
 
 	/* calc differences of buffer counters. */
 	if (es->buffers)
@@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 		if (pstmt->commandType != CMD_UTILITY)
 			ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
-						   &planduration, (es->buffers ? &bufusage : NULL));
+						   &planduration, (es->buffers ? &bufusage : NULL),
+						   &mem_consumed);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
 							  paramLI, queryEnv);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9fc83f11f6..43af271f33 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children,
 								 grand_totals.totalspace - grand_totals.freespace)));
 }
 
+/*
+ * Compute the memory used in the given context and its children.
+ *
+ * XXX: Instead of this separate function we could modify
+ * MemoryContextStatsDetail() to report used memory and disable printing the
+ * detailed stats.
+ */
+extern Size
+MemoryContextMemUsed(MemoryContext context)
+{
+	MemoryContextCounters grand_totals;
+
+	memset(&grand_totals, 0, sizeof(grand_totals));
+
+	MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false);
+
+	return grand_totals.totalspace - grand_totals.freespace;
+}
+
 /*
  * MemoryContextStatsInternal
  *		One recursion level for MemoryContextStats
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..21e3d2f309 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
 						   ExplainState *es, const char *queryString,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
-						   const BufferUsage *bufusage);
+						   const BufferUsage *bufusage,
+						   const Size *mem_consumed);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 21640d62a6..d7c477f229 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children,
 									 bool print_to_stderr);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
 												bool allow);
+extern Size MemoryContextMemUsed(MemoryContext context);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 extern void MemoryContextCheck(MemoryContext context);
-- 
2.25.1

Reply via email to