Hi All, In try_partitionwise_join() we create SpecialJoinInfo structures for child joins by translating SpecialJoinInfo structures applicable to the parent join. These SpecialJoinInfos are not used outside try_partitionwise_join() but we do not free memory allocated to those. In try_partitionwise_join() we create as many SpecialJoinInfos as the number of partitions. But try_partitionwise_join() itself is called as many times as the number of join orders for a given join. Thus the number of child SpecialJoinInfos that are created increases exponentially with the number of partitioned tables being joined.
The attached patch (0002) fixes this issue by 1. declaring child SpecialJoinInfo as a local variable, thus allocating memory on the stack instead of CurrentMemoryContext. The memory on the stack is freed automatically. 2. Freeing the Relids in SpecialJoinInfo explicitly after SpecialJoinInfo has been used. We can not free the object trees in SpecialJoinInfo since those may be referenced in the paths. Similar to my previous emails [1], the memory consumption for given queries is measured using attached patch (0001). The table definitions and helper functions can be found in setup.sql and queries.sql. Following table shows the reduction in the memory using the attached patch (0002). Number of | without | | | Absolute | joining tables | patch | with patch | % diff | diff | -------------------------------------------------------------------- 2 | 40.9 MiB | 39.9 MiB | 2.27% | 925.7 KiB | 3 | 151.7 MiB | 142.6 MiB | 5.97% | 9.0 MiB | 4 | 464.0 MiB | 418.4 MiB | 9.83% | 45.6 MiB | 5 | 1663.9 MiB | 1419.4 MiB | 14.69% | 244.5 MiB | Since the memory consumed by SpecialJoinInfos is exponentially proportional to the number of tables being joined, we see that the patch is able to save more memory at higher number of tables joined. The attached patch (0002) frees members of individual SpecialJoinInfos. It might have some impact on the planning time. We may improve that by allocating the members in a separate memory context. The memory context will be allocated and deleted in each invocation of try_partitionwise_join(). I am assuming that deleting a memory context is more efficient than freeing bits of memory multiple times. But I have not tried that approach. Another approach would be to track the SpecialJoinInfo translations (similar to RestrictListInfo translations proposed in other email thread [2]) and avoid repeatedly translating the same SpecialJoinInfo. The approach will have similar disadvantages that it may increase size of SpecialJoinInfo structure even when planning the queries which do not need partitionwise join. So I haven't tried that approach yet. At this stage, I am looking for opinions on 1. whether it's worth reducing this memory 2. any suggestions/comments on which approach from above is more suitable or if there are other approaches References [1] https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com [2] https://www.postgresql.org/message-id/CAExHW5s=bclmmq8n_bn6iu+pjau0ds3z_6dn6ile69esmsp...@mail.gmail.com -- Best Wishes, Ashutosh Bapat
queries.sql
Description: application/sql
setup.sql
Description: application/sql
From a5d8492517121ddc0366a4a853e041e3d4ab663d Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Date: Mon, 24 Jul 2023 20:20:53 +0530 Subject: [PATCH 2/2] Reduce memory used by child SpecialJoinInfo The SpecialJoinInfo applicable to a child join is computed by translating the same applicable to the parent join in try_partitionwise_join(). The child SpecialJoinInfo is not needed once the child join RelOptInfo is created and paths are added to it. Use a local variable to hold child SpecialJoinInfo so that it doesn't need memory allocated separately. Also free the memory allocated to various Bitmapsets in SpecialJoinInfo. Those are not referenced anywhere. But we do not free the memory allocated to expression trees of operator OID list since those may be referenced in paths or other objects. --- src/backend/optimizer/path/joinrels.c | 66 +++++++++++++++++++-------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 015a0b3cbe..51a811c6d9 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -42,9 +42,11 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *parent_sjinfo, List *parent_restrictlist); -static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root, - SpecialJoinInfo *parent_sjinfo, - Relids left_relids, Relids right_relids); +static void build_child_join_sjinfo(PlannerInfo *root, + SpecialJoinInfo *parent_sjinfo, + Relids left_relids, Relids right_relids, + SpecialJoinInfo *child_sjinfo); +static void free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo); static void compute_partition_bounds(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *parent_sjinfo, @@ -1540,7 +1542,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *child_rel2; bool rel1_empty; bool rel2_empty; - SpecialJoinInfo *child_sjinfo; + SpecialJoinInfo child_sjinfo; List *child_restrictlist; RelOptInfo *child_joinrel; AppendRelInfo **appinfos; @@ -1635,9 +1637,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, * Construct SpecialJoinInfo from parent join relations's * SpecialJoinInfo. */ - child_sjinfo = build_child_join_sjinfo(root, parent_sjinfo, - child_rel1->relids, - child_rel2->relids); + build_child_join_sjinfo(root, parent_sjinfo, child_rel1->relids, + child_rel2->relids, &child_sjinfo); /* Find the AppendRelInfo structures */ appinfos = find_appinfos_by_relids(root, @@ -1660,7 +1661,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, { child_joinrel = build_child_join_rel(root, child_rel1, child_rel2, joinrel, child_restrictlist, - child_sjinfo); + &child_sjinfo); joinrel->part_rels[cnt_parts] = child_joinrel; joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts); joinrel->all_partrels = bms_add_members(joinrel->all_partrels, @@ -1674,10 +1675,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, /* And make paths for the child join */ populate_joinrel_with_paths(root, child_rel1, child_rel2, - child_joinrel, child_sjinfo, + child_joinrel, &child_sjinfo, child_restrictlist); pfree(appinfos); + free_child_sjinfo_members(&child_sjinfo); } } @@ -1686,42 +1688,66 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, * SpecialJoinInfo for the join between parents. left_relids and right_relids * are the relids of left and right side of the join respectively. */ -static SpecialJoinInfo * +static void build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo, - Relids left_relids, Relids right_relids) + Relids left_relids, Relids right_relids, + SpecialJoinInfo *child_sjinfo) { - SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo); AppendRelInfo **left_appinfos; int left_nappinfos; AppendRelInfo **right_appinfos; int right_nappinfos; - memcpy(sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo)); + memcpy(child_sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo)); left_appinfos = find_appinfos_by_relids(root, left_relids, &left_nappinfos); right_appinfos = find_appinfos_by_relids(root, right_relids, &right_nappinfos); - sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand, + child_sjinfo->min_lefthand = adjust_child_relids(child_sjinfo->min_lefthand, left_nappinfos, left_appinfos); - sjinfo->min_righthand = adjust_child_relids(sjinfo->min_righthand, + child_sjinfo->min_righthand = adjust_child_relids(child_sjinfo->min_righthand, right_nappinfos, right_appinfos); - sjinfo->syn_lefthand = adjust_child_relids(sjinfo->syn_lefthand, + child_sjinfo->syn_lefthand = adjust_child_relids(child_sjinfo->syn_lefthand, left_nappinfos, left_appinfos); - sjinfo->syn_righthand = adjust_child_relids(sjinfo->syn_righthand, + child_sjinfo->syn_righthand = adjust_child_relids(child_sjinfo->syn_righthand, right_nappinfos, right_appinfos); /* outer-join relids need no adjustment */ - sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root, - (Node *) sjinfo->semi_rhs_exprs, + child_sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root, + (Node *) child_sjinfo->semi_rhs_exprs, right_nappinfos, right_appinfos); pfree(left_appinfos); pfree(right_appinfos); +} + +/* + * free_child_sjinfo_members + * Free memory consumed by members of a child SpecialJoinInfo. + */ +static void +free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo) +{ + /* + * The relids are used only for comparison and their references are not + * stored anywhere. Free those. + */ + bms_free(child_sjinfo->min_lefthand); + bms_free(child_sjinfo->min_righthand); + bms_free(child_sjinfo->syn_lefthand); + bms_free(child_sjinfo->syn_righthand); + bms_free(child_sjinfo->commute_above_l); + bms_free(child_sjinfo->commute_above_r); + bms_free(child_sjinfo->commute_below_l); + bms_free(child_sjinfo->commute_below_r); - return sjinfo; + /* + * But the list of operator OIDs and the list of expressions may be + * referenced somewhere else. Do not free those. + */ } /* -- 2.25.1
From 3b9367f743923263b56054a79f7f36cd710904e1 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Date: Wed, 12 Jul 2023 14:34:14 +0530 Subject: [PATCH 1/2] Report memory used for planning a query in EXPLAIN ANALYZE The memory used in the CurrentMemoryContext and its children is sampled before and after calling pg_plan_query() from ExplainOneQuery(). The difference in the two samples is reported as the memory consumed while planning the query. This may not account for the memory allocated in memory contexts which are not children of CurrentMemoryContext. These contexts are usually other long lived contexts, e.g. CacheMemoryContext, which are shared by all the queries run in a session. The consumption in those can not be attributed only to a given query and hence should not be reported any way. The memory consumption is reported as "Planning Memory" property in EXPLAIN ANALYZE output. Ashutosh Bapat --- src/backend/commands/explain.c | 12 ++++++++++-- src/backend/commands/prepare.c | 7 ++++++- src/backend/utils/mmgr/mcxt.c | 19 +++++++++++++++++++ src/include/commands/explain.h | 3 ++- src/include/utils/memutils.h | 1 + 5 files changed, 38 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8570b14f62..9f859949f0 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions, planduration; BufferUsage bufusage_start, bufusage; + Size mem_consumed; if (es->buffers) bufusage_start = pgBufferUsage; INSTR_TIME_SET_CURRENT(planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); /* plan the query */ plan = pg_plan_query(query, queryString, cursorOptions, params); INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) + - mem_consumed; /* calc differences of buffer counters. */ if (es->buffers) @@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions, /* run it (if needed) and produce output */ ExplainOnePlan(plan, into, es, queryString, params, queryEnv, - &planduration, (es->buffers ? &bufusage : NULL)); + &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed); } } @@ -527,7 +531,7 @@ void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv, const instr_time *planduration, - const BufferUsage *bufusage) + const BufferUsage *bufusage, const Size *mem_consumed) { DestReceiver *dest; QueryDesc *queryDesc; @@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, double plantime = INSTR_TIME_GET_DOUBLE(*planduration); ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es); + + if (mem_consumed) + ExplainPropertyUInteger("Planning Memory", "bytes", + (uint64) *mem_consumed, es); } /* Print info about runtime of triggers */ diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 18f70319fc..84544ce481 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, instr_time planduration; BufferUsage bufusage_start, bufusage; + Size mem_consumed; if (es->buffers) bufusage_start = pgBufferUsage; INSTR_TIME_SET_CURRENT(planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); /* Look it up in the hash table */ entry = FetchPreparedStatement(execstmt->name, true); @@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) + - mem_consumed; /* calc differences of buffer counters. */ if (es->buffers) @@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, if (pstmt->commandType != CMD_UTILITY) ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv, - &planduration, (es->buffers ? &bufusage : NULL)); + &planduration, (es->buffers ? &bufusage : NULL), + &mem_consumed); else ExplainOneUtility(pstmt->utilityStmt, into, es, query_string, paramLI, queryEnv); diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 9fc83f11f6..43af271f33 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children, grand_totals.totalspace - grand_totals.freespace))); } +/* + * Compute the memory used in the given context and its children. + * + * XXX: Instead of this separate function we could modify + * MemoryContextStatsDetail() to report used memory and disable printing the + * detailed stats. + */ +extern Size +MemoryContextMemUsed(MemoryContext context) +{ + MemoryContextCounters grand_totals; + + memset(&grand_totals, 0, sizeof(grand_totals)); + + MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false); + + return grand_totals.totalspace - grand_totals.freespace; +} + /* * MemoryContextStatsInternal * One recursion level for MemoryContextStats diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index 3d3e632a0c..21e3d2f309 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv, const instr_time *planduration, - const BufferUsage *bufusage); + const BufferUsage *bufusage, + const Size *mem_consumed); extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc); extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc); diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 21640d62a6..d7c477f229 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children, bool print_to_stderr); extern void MemoryContextAllowInCriticalSection(MemoryContext context, bool allow); +extern Size MemoryContextMemUsed(MemoryContext context); #ifdef MEMORY_CONTEXT_CHECKING extern void MemoryContextCheck(MemoryContext context); -- 2.25.1