Hi All,

Following up on [1] ...

A restrictlist is a list of RestrictInfo nodes, each representing one
clause, applicable to a set of joins.  When partitionwise join is used
as a strategy, the restrictlists for child join are obtained by
translating the restrictlists for the parent in
try_partitionwise_join(). That function is called once for every join
order. E.g. when computing join ABC, it will be called for planning
joins AB, BC, AC, (AB)C, A(BC), B(AC). Every time it is called it will
translate the given parent restrictlist. This means that a
RestrictInfo node applicable to given child relations will be
translated as many times as the join orders in which those child
relations appear in different joining relations.

For example, consider a query "select * from A, B, C where A.a = B.a
and B.a = C.a" where A, B and C are partitioned tables. A has
partitions A1, A2, ... An. B has partitions B1, B2, ... Bn and C has
partitions C1, C2, ... Cn. Partitions Ai, Bi and Ci are matching
partitions respectively for all i's. The join ABC is computed as
Append of (A1B1C1, A2B2C2, ... AnBnCn). The clause A.a = B.a is
translated to A1.a = B1.a thrice, when computing A1B1, A1(B1C1) and
B1(A1C1) respectively. Similarly other clauses are translated multiple
times. Some extra translations also happen in
reparameterize_path_by_child().

These translations consume memory which remains allocated till the
statement finishes. A ResrtictInfo should be translated only once per
parent-child pair, thus avoiding consuming extra memory.

There are two patches attached
0001 - to measure memory consumption during planning. This is the same
one as attached to [1].
0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.

Every translated RestrictInfo is stored in the top parent's
RestrictInfo child_rinfos. RestrictInfo::required_relids is used as a
key to search a given translation. I have intercepted
adjust_appendrel_attrs_mutator() to track translations as well as
avoid multiple translations. It first looks for an existing
translation when translating a RestrictInfo and creates a new one only
when one doesn't exist already.

Using this patch the memory consumption for the above query reduces as follows

Number of partitions: 1000

Number of tables | without patch  | with patch | % reduction |
being joined     |                |            |             |
--------------------------------------------------------------
               2 |      40.3 MiB  |   37.7 MiB |       6.43% |
               3 |     146.8 MiB  |  133.0 MiB |       9.42% |
               4 |     445.4 MiB  |  389.5 MiB |      12.57% |
               5 |    1563.2 MiB  | 1243.2 MiB |      20.47% |

The number of times a RestrictInfo requires to be translated increases
exponentially with the number of tables joined. Thus we see more
memory saved as the number of tables joined increases. When two tables
are joined there's only a single join planned so no extra translations
happen in try_partitionwise_join(). The memory saved in case of 2
joining tables comes from avoiding extra translations happening during
reparameterization of paths (in reparameterize_path_by_child()).

The attached patch is to show how much memory can be saved if we avoid
extra translation. But I want to discuss the following things about
the approach.

1. The patch uses RestrictInfo::required_relids as the key for
searching child RelOptInfos. I am not sure which of the two viz.
required_relids and clause_relids is a better key. required_relids
seems to be a subset of clause_relids and from the description it
looks like that's the set that decides the applicability of a clause
in a join. But clause_relids is obtained from all the Vars that appear
in the clause, so may be that's the one that matters for the
translations. Can somebody guide me?

2. The patch adds two extra pointers per RestrictInfo. They will
remain unused when partitionwise join is not used. Right now, I do not
see any increase in memory consumed by planner because of those
pointers even in case of unpartitioned tables; maybe they are absorbed
in memory alignment. They may show up as extra memory in the future. I
am wondering whether we can instead save and track translations in
PlannerInfo as a hash table using <rinfo_serial, required_relids (or
whatever is the answer to above question) of parent and child
respectively> as key. That will just add one extra pointer in
PlannerInfo when partitionwise join is not used. Please let me know
your suggestions.

3. I have changed adjust_appendrel_attrs_mutator() to return a
translated RestrictInfo if it already exists. IOW, it won't always
return a deep copy of given RestrictInfo as it does today. This can be
fixed by writing wrappers around adjust_appendrel_attrs() to translate
RestrictInfo specifically. But maybe we don't always need deep copies.
Are there any cases when we need translated deep copies of
RestrictInfo? Those cases will require fixing callers of
adjust_appendrel_attrs() instead of the mutator.

4. IIRC, when partitionwise join was implemented we had discussed
creating child RestrictInfos using a login similar to
build_joinrel_restrictlist(). That might be another way to build
RestrictInfo only once and use it multiple times. But we felt that it
was much harder problem to solve since it's not known which partitions
from joining partitioned tables will match and will be joined till we
enter try_partitionwise_join(), so the required RestrictInfos may not
be available in RelOptInfo::joininfo. Let me know your thoughts on
this.

Comments/suggestions welcome.

references
[1] 
https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment: setup.sql
Description: application/sql

Attachment: queries.sql
Description: application/sql

From 0f8b6153c95b5dd42c6549af47ea27ef97fd7c04 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH 1/7] Report memory used for planning a query in EXPLAIN
 ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c | 12 ++++++++++--
 src/backend/commands/prepare.c |  7 ++++++-
 src/backend/utils/mmgr/mcxt.c  | 19 +++++++++++++++++++
 src/include/commands/explain.h |  3 ++-
 src/include/utils/memutils.h   |  1 +
 5 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9f859949f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions,
 					planduration;
 		BufferUsage bufusage_start,
 					bufusage;
+		Size		mem_consumed;
 
 		if (es->buffers)
 			bufusage_start = pgBufferUsage;
 		INSTR_TIME_SET_CURRENT(planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 		/* plan the query */
 		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+						- mem_consumed;
 
 		/* calc differences of buffer counters. */
 		if (es->buffers)
@@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 
 		/* run it (if needed) and produce output */
 		ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
-					   &planduration, (es->buffers ? &bufusage : NULL));
+					   &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed);
 	}
 }
 
@@ -527,7 +531,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage)
+			   const BufferUsage *bufusage, const Size *mem_consumed)
 {
 	DestReceiver *dest;
 	QueryDesc  *queryDesc;
@@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
 		ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es);
+
+		if (mem_consumed)
+			ExplainPropertyUInteger("Planning Memory", "bytes",
+										(uint64) *mem_consumed, es);
 	}
 
 	/* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..84544ce481 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	Size		mem_consumed;
 
 	if (es->buffers)
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 	/* Look it up in the hash table */
 	entry = FetchPreparedStatement(execstmt->name, true);
@@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+					- mem_consumed;
 
 	/* calc differences of buffer counters. */
 	if (es->buffers)
@@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 		if (pstmt->commandType != CMD_UTILITY)
 			ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
-						   &planduration, (es->buffers ? &bufusage : NULL));
+						   &planduration, (es->buffers ? &bufusage : NULL),
+						   &mem_consumed);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
 							  paramLI, queryEnv);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9fc83f11f6..43af271f33 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children,
 								 grand_totals.totalspace - grand_totals.freespace)));
 }
 
+/*
+ * Compute the memory used in the given context and its children.
+ *
+ * XXX: Instead of this separate function we could modify
+ * MemoryContextStatsDetail() to report used memory and disable printing the
+ * detailed stats.
+ */
+extern Size
+MemoryContextMemUsed(MemoryContext context)
+{
+	MemoryContextCounters grand_totals;
+
+	memset(&grand_totals, 0, sizeof(grand_totals));
+
+	MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false);
+
+	return grand_totals.totalspace - grand_totals.freespace;
+}
+
 /*
  * MemoryContextStatsInternal
  *		One recursion level for MemoryContextStats
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..21e3d2f309 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
 						   ExplainState *es, const char *queryString,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
-						   const BufferUsage *bufusage);
+						   const BufferUsage *bufusage,
+						   const Size *mem_consumed);
 
 extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
 extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 21640d62a6..d7c477f229 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children,
 									 bool print_to_stderr);
 extern void MemoryContextAllowInCriticalSection(MemoryContext context,
 												bool allow);
+extern Size MemoryContextMemUsed(MemoryContext context);
 
 #ifdef MEMORY_CONTEXT_CHECKING
 extern void MemoryContextCheck(MemoryContext context);
-- 
2.25.1

From 99f06a021fc4330ba1fcbf61f4c0e085b96df500 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 22 Jun 2023 14:27:14 +0530
Subject: [PATCH 2/7] Avoid translating RestrictInfo repeatedly

RestrictInfo to be applied to the child rels (including the child join rels)
are obtained by translating RestrictInfo applicable to the parent rel. Since
these translations are not tracked, the same RestrictInfo may get translated
multiple times for the same parent and child pair. When using partitionwise
join this can happen as many times as the number of possible join orders
between the partitioned tables.

Every translated RestrictInfo is stored in the top parent's RestrictInfo in a
list. Every translated RestrictInfo contains a pointer to the top parent's
RestrictInfo to track multilevel translations. RestrictInfo::required_relids is
used as a key to search a given translation. adjust_appendrel_attrs_mutator()
first looks up an existing translations when translating a RestrictInfo and
creates a new one only when one doesn't exist.

Ashutosh Bapat
---
 src/backend/optimizer/util/appendinfo.c   | 47 ++++++++++++++++++++++-
 src/backend/optimizer/util/restrictinfo.c |  2 +
 src/include/nodes/pathnodes.h             |  7 ++++
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..70a5ff2750 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -217,6 +217,7 @@ adjust_appendrel_attrs_mutator(Node *node,
 {
 	AppendRelInfo **appinfos = context->appinfos;
 	int			nappinfos = context->nappinfos;
+	PlannerInfo	   *root = context->root;
 	int			cnt;
 
 	if (node == NULL)
@@ -445,7 +446,46 @@ adjust_appendrel_attrs_mutator(Node *node,
 	if (IsA(node, RestrictInfo))
 	{
 		RestrictInfo *oldinfo = (RestrictInfo *) node;
-		RestrictInfo *newinfo = makeNode(RestrictInfo);
+		Relids child_required_relids = adjust_child_relids(oldinfo->required_relids,
+															nappinfos, appinfos);
+		RestrictInfo   *parent_rinfo;
+		ListCell   *lc;
+		RestrictInfo *newinfo;
+		MemoryContext	old_context;
+		
+		/*
+		 * If the adjusted RestrictInfo already exists, use it otherwise create
+		 * a new one. This avoids translating the same RestrictInfo again and
+		 * again wasting memory especially in partitionwise joins.
+		 */
+
+		if (bms_equal(child_required_relids, oldinfo->required_relids))
+		{
+			/* If the clause does not need any translation. */
+			bms_free(child_required_relids);
+			return (Node *) oldinfo;
+		}
+
+		/*
+		 * Check if we already have the RestrictInfo for the given child in the
+		 * topmost parent's RestrictInfo.
+		 */
+		parent_rinfo = oldinfo->parent_rinfo ? oldinfo->parent_rinfo : oldinfo;
+		foreach (lc, parent_rinfo->child_rinfos)
+		{
+			newinfo = lfirst(lc);
+
+			if (bms_equal(newinfo->required_relids, child_required_relids))
+			{
+				bms_free(child_required_relids);
+				return (Node *) newinfo;
+			}
+		}
+		bms_free(child_required_relids);
+
+		/* Translate in a long lasting memory context. */
+		old_context = MemoryContextSwitchTo(root->planner_cxt);
+		newinfo = makeNode(RestrictInfo);
 
 		/* Copy all flat-copiable fields, notably including rinfo_serial */
 		memcpy(newinfo, oldinfo, sizeof(RestrictInfo));
@@ -491,6 +531,11 @@ adjust_appendrel_attrs_mutator(Node *node,
 		newinfo->right_bucketsize = -1;
 		newinfo->left_mcvfreq = -1;
 		newinfo->right_mcvfreq = -1;
+		newinfo->parent_rinfo = parent_rinfo;
+		parent_rinfo->child_rinfos = lappend(parent_rinfo->child_rinfos,
+											 newinfo);
+
+		MemoryContextSwitchTo(old_context);
 
 		return (Node *) newinfo;
 	}
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index d6d26a2b51..7a97ab7407 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -246,6 +246,8 @@ make_restrictinfo_internal(PlannerInfo *root,
 
 	restrictinfo->left_hasheqoperator = InvalidOid;
 	restrictinfo->right_hasheqoperator = InvalidOid;
+	restrictinfo->child_rinfos = NIL;
+	restrictinfo->parent_rinfo = NULL;
 
 	return restrictinfo;
 }
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7ad..a064629626 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2655,6 +2655,13 @@ typedef struct RestrictInfo
 	/* hash equality operators used for memoize nodes, else InvalidOid */
 	Oid			left_hasheqoperator pg_node_attr(equal_ignore);
 	Oid			right_hasheqoperator pg_node_attr(equal_ignore);
+
+	/* Only one of these two can be set. */
+	List	   *child_rinfos pg_node_attr(equal_ignore, copy_as(NIL)); /* RestrictInfos derived for children. */
+	struct RestrictInfo *parent_rinfo pg_node_attr(equal_ignore, copy_as(NULL));		/* Parent restrictinfo this
+											 * RestrictInf is derived from.
+											 */
+
 } RestrictInfo;
 
 /*
-- 
2.25.1

Reply via email to