Hi All, If add_path() and add_partial_path() do not find a new path to be superior to any existing paths, they free the new path. They free an existing path if it is found to be inferior to the new path. But not all paths surviving in a RelOptInfo are used to create paths for relations which use it as input. Further add_path and add_partial_path do not free the whole path tree but just the topmost pathnode. The subpath nodes are not freed because they may be referenced by other paths. The subpaths continue to occupy memory even if they are not used anywhere. As we build the relation tree upwards (base, join, upper relations) more and more such paths are accumulated and continue to consume memory till the statement ends. With partitionwise join involving partitioned tables with thousands of partitions this memory consumption increases proportional to the number of partitions.
Attached WIP patches address this issue by adding infrastructure to track references to paths and free unreferenced paths once they can be deemed as useless. A new member Path::ref_count in a pathnode tracks how many other objects, like pathlists in RelOptInfo and other paths, reference that pathnode. As the path nodes are referenced they are "linked" using link_path() to referencing objects. When the path nodes are no longer referenced they are "unlinked" using unlink_path() from the referencing objects. Path nodes are freed using free_path(). The function unlinks the sub path nodes so that they can be freed when their reference count drops to 0. The paths whose references reach 0 during unlinking are freed automatically using free_path(). Patch 0002 adds this infrastructure. 0003 and 0004 use these functions in example cases. With these patches the memory consumption numbers look like below. Experiment ---------- Memory consumption is measured using a method similar to the one described in [1]. The changes to EXPLAIN ANALYZE to report planning memory are in 0001. Memory consumed when planning a self-join query is measured. The queries involve partitioned and unpartitioned tables respectively. The partitioned table has 1000 partitions in it. The table definitions and helper function can be found in setup.sql and the queries can be found in queries.sql. This is the simplest setup. Higher savings may be seen with more complex setups involving indexes, SQL operators and clauses. Table 1: Join between unpartitioned tables Number of tables | without patch | with patch | % reduction | being joined | | | | -------------------------------------------------------------- 2 | 29.0 KiB | 28.9 KiB | 0.66% | 3 | 79.1 KiB | 76.7 KiB | 3.03% | 4 | 208.6 KiB | 198.2 KiB | 4.97% | 5 | 561.6 KiB | 526.3 KiB | 6.28% | Table 2: join between partitioned tables with partitionwise join enabled (enable_partitionwise_join = true). Number of tables | without patch | with patch | % reduction | being joined | | | | ---------------------------------------------------------------- 2 | 40.3 MiB | 40.0 MiB | 0.70% | 3 | 146.9 MiB | 143.1 MiB | 2.55% | 4 | 445.4 MiB | 430.4 MiB | 3.38% | 5 | 1563.3 MiB | 1517.6 MiB | 2.92% | The patch is not complete because of following issues: a. 0003 and 0004 patches do not cover all the path nodes. I have covered only those which I encountered in the queries I ran. If we accept this proposal, I will work on covering all the path nodes. b. It does not cover the entire RelOptInfo tree that the planner creates. The paths in a lower level RelOptInfo can be deemed as useless only when path creation for all the immediate upper RelOptInfos is complete. Thus we need access to both upper and lower level RelOptInfos at the same time. The RelOptInfos created while planning join are available in standard_join_search(). Thus it's possible to free unused paths listed in all the RelOptInfos except the topmost RelOptInfo in this function as done in the patch. But the topmost RelOptInfo acts as an input to upper RelOptInfo in grouping_planner() where we don't have access to the RelOptInfos at lower levels in the join tree. Hence we can't free the unused paths from topmost RelOptInfo and cascade that effect down the RelOptInfo tree. This is the reason why we don't see memory reduction in case of 2-way join. This is also the reason why the numbers above are lower than the actual memory that can be saved. If we decide to move ahead with this approach, I will work on this too. c. The patch does not call link_path and unlink_path in all the required places. We will need some work to identify such places, to build infrastructure and methods to identify such places in future. Another approach to fixing this would be to use separate memory context for creating path nodes and then deleting the entire memory context at the end of planning once the plan is created. We will need to reset path lists as well as cheapest_path members in RelOptInfos as well. This will possibly free up more memory and might be faster. But I have not tried it. The approach taken in the patch has an advantage over this one i.e. the paths can be freed at any stage in the planning using free_unused_paths_from_rel() implemented in the patch. Thus we can monitor the memory consumption and trigger garbage collection when it crosses a certain threashold. Or we may implement both the approaches to clean every bit of paths at the end of planning while garbage collecting pathnodes when memory consumption goes beyond threashold. The reference mechanism may have other usages as well. Suggestions/comments welcome. References 1. https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com -- Best Wishes, Ashutosh Bapat
setup.sql
Description: application/sql
queries.sql
Description: application/sql
From 3d24fca1b861741949225b40f6df8892409e8d00 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Date: Mon, 17 Jul 2023 15:10:45 +0530 Subject: [PATCH 4/4] Local variables pointing to path node used repeatedly In match_unsorted_outer() we create a materialize path for inner relation and pass it to try_nestloop_path repeatedly. Link and unlink the path to and from the local variable to keep track of it. --- src/backend/optimizer/path/joinpath.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index f047ad9ba4..d6560aa6d5 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -1762,12 +1762,14 @@ match_unsorted_outer(PlannerInfo *root, /* * Consider materializing the cheapest inner path, unless * enable_material is off or the path in question materializes its - * output anyway. + * output anyway. Link the path to a local reference so that it won't be + * freed by add_path if the surrounding nest path is freed by add_path. + * It will get freed if not used later. */ if (enable_material && inner_cheapest_total != NULL && !ExecMaterializesOutput(inner_cheapest_total->pathtype)) - matpath = (Path *) - create_material_path(innerrel, inner_cheapest_total); + link_path(&matpath, + (Path *) create_material_path(innerrel, inner_cheapest_total)); } foreach(lc1, outerrel->pathlist) @@ -1883,6 +1885,9 @@ match_unsorted_outer(PlannerInfo *root, false); } + /* Materialized inner path won't be used anymore. Unlink it */ + unlink_path(&matpath); + /* * Consider partial nestloop and mergejoin plan if outerrel has any * partial path and the joinrel is parallel-safe. However, we can't -- 2.25.1
From 062263008a413dd3561246f2f8793e7674a56fd1 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Date: Wed, 12 Jul 2023 14:34:14 +0530 Subject: [PATCH 1/4] Report memory used for planning a query in EXPLAIN ANALYZE The memory used in the CurrentMemoryContext and its children is sampled before and after calling pg_plan_query() from ExplainOneQuery(). The difference in the two samples is reported as the memory consumed while planning the query. This may not account for the memory allocated in memory contexts which are not children of CurrentMemoryContext. These contexts are usually other long lived contexts, e.g. CacheMemoryContext, which are shared by all the queries run in a session. The consumption in those can not be attributed only to a given query and hence should not be reported any way. The memory consumption is reported as "Planning Memory" property in EXPLAIN ANALYZE output. Ashutosh Bapat --- src/backend/commands/explain.c | 12 ++++++++++-- src/backend/commands/prepare.c | 7 ++++++- src/backend/utils/mmgr/mcxt.c | 19 +++++++++++++++++++ src/include/commands/explain.h | 3 ++- src/include/utils/memutils.h | 1 + 5 files changed, 38 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8570b14f62..9f859949f0 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions, planduration; BufferUsage bufusage_start, bufusage; + Size mem_consumed; if (es->buffers) bufusage_start = pgBufferUsage; INSTR_TIME_SET_CURRENT(planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); /* plan the query */ plan = pg_plan_query(query, queryString, cursorOptions, params); INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) + - mem_consumed; /* calc differences of buffer counters. */ if (es->buffers) @@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions, /* run it (if needed) and produce output */ ExplainOnePlan(plan, into, es, queryString, params, queryEnv, - &planduration, (es->buffers ? &bufusage : NULL)); + &planduration, (es->buffers ? &bufusage : NULL), &mem_consumed); } } @@ -527,7 +531,7 @@ void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv, const instr_time *planduration, - const BufferUsage *bufusage) + const BufferUsage *bufusage, const Size *mem_consumed) { DestReceiver *dest; QueryDesc *queryDesc; @@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, double plantime = INSTR_TIME_GET_DOUBLE(*planduration); ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es); + + if (mem_consumed) + ExplainPropertyUInteger("Planning Memory", "bytes", + (uint64) *mem_consumed, es); } /* Print info about runtime of triggers */ diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 18f70319fc..84544ce481 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, instr_time planduration; BufferUsage bufusage_start, bufusage; + Size mem_consumed; if (es->buffers) bufusage_start = pgBufferUsage; INSTR_TIME_SET_CURRENT(planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); /* Look it up in the hash table */ entry = FetchPreparedStatement(execstmt->name, true); @@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) + - mem_consumed; /* calc differences of buffer counters. */ if (es->buffers) @@ -640,7 +644,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, if (pstmt->commandType != CMD_UTILITY) ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv, - &planduration, (es->buffers ? &bufusage : NULL)); + &planduration, (es->buffers ? &bufusage : NULL), + &mem_consumed); else ExplainOneUtility(pstmt->utilityStmt, into, es, query_string, paramLI, queryEnv); diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 9fc83f11f6..43af271f33 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -747,6 +747,25 @@ MemoryContextStatsDetail(MemoryContext context, int max_children, grand_totals.totalspace - grand_totals.freespace))); } +/* + * Compute the memory used in the given context and its children. + * + * XXX: Instead of this separate function we could modify + * MemoryContextStatsDetail() to report used memory and disable printing the + * detailed stats. + */ +extern Size +MemoryContextMemUsed(MemoryContext context) +{ + MemoryContextCounters grand_totals; + + memset(&grand_totals, 0, sizeof(grand_totals)); + + MemoryContextStatsInternal(context, 0, false, 100, &grand_totals, false); + + return grand_totals.totalspace - grand_totals.freespace; +} + /* * MemoryContextStatsInternal * One recursion level for MemoryContextStats diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index 3d3e632a0c..21e3d2f309 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -92,7 +92,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv, const instr_time *planduration, - const BufferUsage *bufusage); + const BufferUsage *bufusage, + const Size *mem_consumed); extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc); extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc); diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 21640d62a6..d7c477f229 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -92,6 +92,7 @@ extern void MemoryContextStatsDetail(MemoryContext context, int max_children, bool print_to_stderr); extern void MemoryContextAllowInCriticalSection(MemoryContext context, bool allow); +extern Size MemoryContextMemUsed(MemoryContext context); #ifdef MEMORY_CONTEXT_CHECKING extern void MemoryContextCheck(MemoryContext context); -- 2.25.1
From 21b6132417644a07054a85b52e402c31d5c23420 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Date: Mon, 17 Jul 2023 10:57:30 +0530 Subject: [PATCH 3/4] Actual code to use pathnode referencing infrastructure The commit uses the infrastructure built by the previous commit to references, unreference and free paths. The code is not completely mature. There are TODOs. --- src/backend/optimizer/util/pathnode.c | 126 +++++++++++++++----------- 1 file changed, 71 insertions(+), 55 deletions(-) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 09745d19b1..65469637b2 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -586,12 +586,8 @@ add_path(RelOptInfo *parent_rel, Path *new_path) { parent_rel->pathlist = foreach_delete_current(parent_rel->pathlist, p1); - - /* - * Delete the data pointed-to by the deleted cell, if possible - */ - if (!IsA(old_path, IndexPath)) - pfree(old_path); + /* TODO: write a routine to unlink a path from the list node and delete the list node */ + unlink_path(&old_path); } else { @@ -614,12 +610,12 @@ add_path(RelOptInfo *parent_rel, Path *new_path) /* Accept the new path: insert it at proper place in pathlist */ parent_rel->pathlist = list_insert_nth(parent_rel->pathlist, insert_at, new_path); + new_path->ref_count++; + /* TODO: write a function to link_path in a List */ } else { - /* Reject and recycle the new path */ - if (!IsA(new_path, IndexPath)) - pfree(new_path); + free_path(new_path); } } @@ -822,7 +818,8 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path) { parent_rel->partial_pathlist = foreach_delete_current(parent_rel->partial_pathlist, p1); - pfree(old_path); + /* TODO use routine to unlink a path from the linked list */ + unlink_path(&old_path); } else { @@ -845,11 +842,13 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path) /* Accept the new path: insert it at proper place */ parent_rel->partial_pathlist = list_insert_nth(parent_rel->partial_pathlist, insert_at, new_path); + new_path->ref_count++; + /* TODO create a routine to link path in a list at a given place */ } else { /* Reject and recycle the new path */ - pfree(new_path); + free_path(new_path); } } @@ -1196,7 +1195,7 @@ create_bitmap_heap_path(PlannerInfo *root, pathnode->path.parallel_workers = parallel_degree; pathnode->path.pathkeys = NIL; /* always unordered */ - pathnode->bitmapqual = bitmapqual; + link_path(&(pathnode->bitmapqual), bitmapqual); cost_bitmap_heap_scan(&pathnode->path, root, rel, pathnode->path.param_info, @@ -1231,6 +1230,8 @@ create_bitmap_and_path(PlannerInfo *root, { Path *bitmapqual = (Path *) lfirst(lc); + /* TODO: find a better way to link a path in a linked list */ + bitmapqual->ref_count++; required_outer = bms_add_members(required_outer, PATH_REQ_OUTER(bitmapqual)); } @@ -1283,6 +1284,8 @@ create_bitmap_or_path(PlannerInfo *root, { Path *bitmapqual = (Path *) lfirst(lc); + /* TODO: find a better way to link a path in a linked list */ + bitmapqual->ref_count++; required_outer = bms_add_members(required_outer, PATH_REQ_OUTER(bitmapqual)); } @@ -1452,6 +1455,8 @@ create_append_path(PlannerInfo *root, { Path *subpath = (Path *) lfirst(l); + /* TODO: we should use the routine to link path to list */ + subpath->ref_count++; pathnode->path.parallel_safe = pathnode->path.parallel_safe && subpath->parallel_safe; @@ -1587,6 +1592,9 @@ create_merge_append_path(PlannerInfo *root, { Path *subpath = (Path *) lfirst(l); + /* TODO: use routine which links a path into a list */ + subpath->ref_count++; + pathnode->path.rows += subpath->rows; pathnode->path.parallel_safe = pathnode->path.parallel_safe && subpath->parallel_safe; @@ -1713,7 +1721,7 @@ create_material_path(RelOptInfo *rel, Path *subpath) pathnode->path.parallel_workers = subpath->parallel_workers; pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); cost_material(&pathnode->path, subpath->startup_cost, @@ -1747,7 +1755,7 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.parallel_workers = subpath->parallel_workers; pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->hash_operators = hash_operators; pathnode->param_exprs = param_exprs; pathnode->singlerow = singlerow; @@ -1838,7 +1846,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, */ pathnode->path.pathkeys = NIL; - pathnode->subpath = subpath; + link_path(&(pathnode->subpath), subpath); pathnode->in_operators = sjinfo->semi_operators; pathnode->uniq_exprs = sjinfo->semi_rhs_exprs; @@ -1859,7 +1867,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.total_cost = subpath->total_cost; pathnode->path.pathkeys = subpath->pathkeys; - rel->cheapest_unique_path = (Path *) pathnode; + /* TODO: Do we need to make sure that cheapest_unique_path is NULL. */ + link_path(&(rel->cheapest_unique_path), (Path *) pathnode); MemoryContextSwitchTo(oldcontext); @@ -1897,7 +1906,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.total_cost = subpath->total_cost; pathnode->path.pathkeys = subpath->pathkeys; - rel->cheapest_unique_path = (Path *) pathnode; + link_path(&(rel->cheapest_unique_path), (Path *) pathnode); MemoryContextSwitchTo(oldcontext); @@ -1992,7 +2001,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.total_cost = sort_path.total_cost; } - rel->cheapest_unique_path = (Path *) pathnode; + link_path(&(rel->cheapest_unique_path), (Path *) pathnode); MemoryContextSwitchTo(oldcontext); @@ -2023,7 +2032,7 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, required_outer); pathnode->path.parallel_aware = false; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->num_workers = subpath->parallel_workers; pathnode->path.pathkeys = pathkeys; pathnode->path.pathtarget = target ? target : rel->reltarget; @@ -2114,7 +2123,7 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, pathnode->path.parallel_workers = 0; pathnode->path.pathkeys = NIL; /* Gather has unordered result */ - pathnode->subpath = subpath; + link_path(&(pathnode->subpath), subpath); pathnode->num_workers = subpath->parallel_workers; pathnode->single_copy = false; @@ -2157,7 +2166,7 @@ create_subqueryscan_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, subpath->parallel_safe; pathnode->path.parallel_workers = subpath->parallel_workers; pathnode->path.pathkeys = pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); cost_subqueryscan(pathnode, root, rel, pathnode->path.param_info, trivial_pathtarget); @@ -2385,7 +2394,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel, pathnode->path.total_cost = total_cost; pathnode->path.pathkeys = pathkeys; - pathnode->fdw_outerpath = fdw_outerpath; + link_path(&pathnode->fdw_outerpath, fdw_outerpath); pathnode->fdw_private = fdw_private; return pathnode; @@ -2435,7 +2444,7 @@ create_foreign_join_path(PlannerInfo *root, RelOptInfo *rel, pathnode->path.total_cost = total_cost; pathnode->path.pathkeys = pathkeys; - pathnode->fdw_outerpath = fdw_outerpath; + link_path(&pathnode->fdw_outerpath, fdw_outerpath); pathnode->fdw_private = fdw_private; return pathnode; @@ -2480,7 +2489,7 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel, pathnode->path.total_cost = total_cost; pathnode->path.pathkeys = pathkeys; - pathnode->fdw_outerpath = fdw_outerpath; + link_path(&pathnode->fdw_outerpath, fdw_outerpath); pathnode->fdw_private = fdw_private; return pathnode; @@ -2610,8 +2619,8 @@ create_nestloop_path(PlannerInfo *root, pathnode->jpath.path.pathkeys = pathkeys; pathnode->jpath.jointype = jointype; pathnode->jpath.inner_unique = extra->inner_unique; - pathnode->jpath.outerjoinpath = outer_path; - pathnode->jpath.innerjoinpath = inner_path; + link_path(&(pathnode->jpath.outerjoinpath), outer_path); + link_path(&(pathnode->jpath.innerjoinpath), inner_path); pathnode->jpath.joinrestrictinfo = restrict_clauses; final_cost_nestloop(root, pathnode, workspace, extra); @@ -2674,8 +2683,8 @@ create_mergejoin_path(PlannerInfo *root, pathnode->jpath.path.pathkeys = pathkeys; pathnode->jpath.jointype = jointype; pathnode->jpath.inner_unique = extra->inner_unique; - pathnode->jpath.outerjoinpath = outer_path; - pathnode->jpath.innerjoinpath = inner_path; + link_path(&(pathnode->jpath.outerjoinpath), outer_path); + link_path(&(pathnode->jpath.innerjoinpath), inner_path); pathnode->jpath.joinrestrictinfo = restrict_clauses; pathnode->path_mergeclauses = mergeclauses; pathnode->outersortkeys = outersortkeys; @@ -2751,8 +2760,8 @@ create_hashjoin_path(PlannerInfo *root, pathnode->jpath.path.pathkeys = NIL; pathnode->jpath.jointype = jointype; pathnode->jpath.inner_unique = extra->inner_unique; - pathnode->jpath.outerjoinpath = outer_path; - pathnode->jpath.innerjoinpath = inner_path; + link_path(&(pathnode->jpath.outerjoinpath), outer_path); + link_path(&(pathnode->jpath.innerjoinpath), inner_path); pathnode->jpath.joinrestrictinfo = restrict_clauses; pathnode->path_hashclauses = hashclauses; /* final_cost_hashjoin will fill in pathnode->num_batches */ @@ -2793,6 +2802,9 @@ create_projection_path(PlannerInfo *root, Assert(subpp->path.parent == rel); subpath = subpp->subpath; Assert(!IsA(subpath, ProjectionPath)); + + /* Free it if not used anywhere else. */ + unlink_path((Path **) &subpp); } pathnode->path.pathtype = T_Result; @@ -2808,7 +2820,7 @@ create_projection_path(PlannerInfo *root, /* Projection does not change the sort order */ pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); /* * We might not need a separate Result node. If the input plan node type @@ -2927,21 +2939,21 @@ apply_projection_to_path(PlannerInfo *root, { GatherPath *gpath = (GatherPath *) path; - gpath->subpath = (Path *) - create_projection_path(root, - gpath->subpath->parent, - gpath->subpath, - target); + link_path(&gpath->subpath, + (Path *) create_projection_path(root, + gpath->subpath->parent, + gpath->subpath, + target)); } else { GatherMergePath *gmpath = (GatherMergePath *) path; - gmpath->subpath = (Path *) - create_projection_path(root, - gmpath->subpath->parent, - gmpath->subpath, - target); + link_path(&gmpath->subpath, + (Path *) create_projection_path(root, + gmpath->subpath->parent, + gmpath->subpath, + target)); } } else if (path->parallel_safe && @@ -2990,7 +3002,7 @@ create_set_projection_path(PlannerInfo *root, /* Projection does not change the sort order XXX? */ pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); /* * Estimate number of rows produced by SRFs for each row of input; if @@ -3059,7 +3071,7 @@ create_incremental_sort_path(PlannerInfo *root, pathnode->path.parallel_workers = subpath->parallel_workers; pathnode->path.pathkeys = pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); cost_incremental_sort(&pathnode->path, root, pathkeys, presorted_keys, @@ -3106,7 +3118,7 @@ create_sort_path(PlannerInfo *root, pathnode->path.parallel_workers = subpath->parallel_workers; pathnode->path.pathkeys = pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); cost_sort(&pathnode->path, root, pathkeys, subpath->total_cost, @@ -3152,7 +3164,7 @@ create_group_path(PlannerInfo *root, /* Group doesn't change sort ordering */ pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->groupClause = groupClause; pathnode->qual = qual; @@ -3210,7 +3222,7 @@ create_upper_unique_path(PlannerInfo *root, /* Unique doesn't change the input ordering */ pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->numkeys = numCols; /* @@ -3267,7 +3279,7 @@ create_agg_path(PlannerInfo *root, pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */ else pathnode->path.pathkeys = NIL; /* output is unordered */ - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->aggstrategy = aggstrategy; pathnode->aggsplit = aggsplit; @@ -3330,7 +3342,7 @@ create_groupingsets_path(PlannerInfo *root, pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; pathnode->path.parallel_workers = subpath->parallel_workers; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); /* * Simplify callers by downgrading AGG_SORTED to AGG_PLAIN, and AGG_MIXED @@ -3568,7 +3580,7 @@ create_windowagg_path(PlannerInfo *root, /* WindowAgg preserves the input sort order */ pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->winclause = winclause; pathnode->qual = qual; pathnode->topwindow = topwindow; @@ -3638,7 +3650,7 @@ create_setop_path(PlannerInfo *root, pathnode->path.pathkeys = (strategy == SETOP_SORTED) ? subpath->pathkeys : NIL; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->cmd = cmd; pathnode->strategy = strategy; pathnode->distinctList = distinctList; @@ -3697,8 +3709,8 @@ create_recursiveunion_path(PlannerInfo *root, /* RecursiveUnion result is always unsorted */ pathnode->path.pathkeys = NIL; - pathnode->leftpath = leftpath; - pathnode->rightpath = rightpath; + link_path(&pathnode->leftpath, leftpath); + link_path(&pathnode->rightpath, rightpath); pathnode->distinctList = distinctList; pathnode->wtParam = wtParam; pathnode->numGroups = numGroups; @@ -3740,7 +3752,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel, */ pathnode->path.pathkeys = NIL; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->rowMarks = rowMarks; pathnode->epqParam = epqParam; @@ -3843,7 +3855,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel, pathnode->path.pathtarget->width = 0; } - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->operation = operation; pathnode->canSetTag = canSetTag; pathnode->nominalRelation = nominalRelation; @@ -3901,7 +3913,7 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel, pathnode->path.startup_cost = subpath->startup_cost; pathnode->path.total_cost = subpath->total_cost; pathnode->path.pathkeys = subpath->pathkeys; - pathnode->subpath = subpath; + link_path(&pathnode->subpath, subpath); pathnode->limitOffset = limitOffset; pathnode->limitCount = limitCount; pathnode->limitOption = limitOption; @@ -4182,6 +4194,7 @@ do { \ (path) = reparameterize_path_by_child(root, (path), child_rel); \ if ((path) == NULL) \ return NULL; \ + link_path(&(path), (path)); \ } while(0) #define REPARAMETERIZE_CHILD_PATH_LIST(pathlist) \ @@ -4469,11 +4482,14 @@ reparameterize_pathlist_by_child(PlannerInfo *root, if (path == NULL) { + /* TODO: unlink paths in the list */ list_free(result); return NIL; } + /* TODO: need a routine to link a path into a linked list */ result = lappend(result, path); + path->ref_count++; } return result; -- 2.25.1
From 38d79202803836613e8e6d366db1650e503de419 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Date: Mon, 17 Jul 2023 10:44:18 +0530 Subject: [PATCH 2/4] Basic infrastructure to link, unlink and free pathnodes add_path and add_partial_path free path nodes which they deem sub-optimal. But they just free the uppermost pathnode in the path. The subpaths continue to occupy memory even if they are not used anywhere. The subpath nodes are not freed because they may be referenced by other paths. This commit introduces the infrastructure to track references to paths. As the path nodes are referenced they are "linked" using link_path() to referencing objects. When the path nodes are no longer useful they are "unlinked" using unlink_path() from the referencing objects. The paths whose references reach 0 during unlinking are freed automatically using free_path(). The function unlinks the sub path nodes and can be called when freeing any path node. When the final path for join tree is chosen the paths linked to each participating relation are unlinked, thus ultimately getting freed if not used. TODO The functions free_path(), unlink_path() and link_path() have ereports to catch code paths which do not use these function correctly. They call errbacktrace() which is not used anywhere else. These ereport calls will need to be fixed for productization. --- src/backend/optimizer/path/allpaths.c | 77 +++++++++++++++ src/backend/optimizer/util/pathnode.c | 136 ++++++++++++++++++++++++++ src/include/nodes/pathnodes.h | 1 + src/include/optimizer/pathnode.h | 38 +++++++ 4 files changed, 252 insertions(+) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 9bdc70c702..f16fd4747c 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -3386,6 +3386,81 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist) } } +/* + * TODO: Need similar code to free paths in upper relations. + */ +static void +free_unused_paths_from_rel(RelOptInfo *rel) +{ + ListCell *lc_path; + int cnt_part; + + foreach(lc_path, rel->pathlist) + { + Path *path = (Path *) lfirst(lc_path); + + /* Free the path if none references it. */ + if (path->ref_count == 1) + { + /* TODO: use routine to unlink path from list */ + rel->pathlist = foreach_delete_current(rel->pathlist, lc_path); + unlink_path(&path); + } + } + + /* Do the same for partial pathlist. */ + foreach(lc_path, rel->partial_pathlist) + { + Path *path = (Path *) lfirst(lc_path); + + /* Free the path if none references it. */ + if (path->ref_count == 1) + { + rel->partial_pathlist = foreach_delete_current(rel->partial_pathlist, lc_path); + unlink_path(&path); + } + } + + /* + * TODO: We can perform this in generate_partitionwise_paths as well since + * by that time all the paths from partitions will be linked if used. + */ + + /* Free unused paths from the partition relations */ + for (cnt_part = 0; cnt_part < rel->nparts; cnt_part++) + { + free_unused_paths_from_rel(rel->part_rels[cnt_part]); + } +} + +/* + * Free unused paths from all the relations created while building the join tree. + */ +static void +free_unused_paths(PlannerInfo *root, int levels_needed) +{ + int cnt; + + for (cnt = levels_needed - 1; cnt >= 1; cnt--) + { + ListCell *lc; + + foreach (lc, root->join_rel_level[cnt]) + { + free_unused_paths_from_rel(lfirst(lc)); + } + } + + for (cnt = 0; cnt < root->simple_rel_array_size; cnt++) + { + RelOptInfo *rel = root->simple_rel_array[cnt]; + + /* Skip empty slots */ + if (rel != NULL) + free_unused_paths_from_rel(rel); + } +} + /* * standard_join_search * Find possible joinpaths for a query by successively finding ways @@ -3487,6 +3562,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) } } + free_unused_paths(root, levels_needed); + /* * We should have a single rel at the final level. */ diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 5f5596841c..09745d19b1 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -915,6 +915,142 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost, return true; } +void +free_path(Path *path) +{ + /* If the path is still referenced, freeing it would create a dangling pointer. */ + /* TODO: it could just be an Assert? */ + if (path->ref_count != 0) + { + ereport(WARNING, + (errcode(ERRCODE_INTERNAL_ERROR), + errbacktrace(), + errmsg("path node %p of type %d has reference count %d, can not be freed", + path, path->pathtype, path->ref_count))); + return; + } + + /* + * A path referenced in the parent relation's pathlist can't be freed. + * Ideally such a path should have ref_count >= 1. If this happens it + * indicates that the path was not linked somewhere, and yet unlinked + * (usually by free_path()). + */ + if (list_member_ptr(path->parent->pathlist, path)) + { + ereport(WARNING, + (errcode(ERRCODE_INTERNAL_ERROR), + errbacktrace(), + errmsg("path node %p of type %d has reference count %d but is linked in parent's pathlist, can not be freed", + path, path->pathtype, path->ref_count))); + return; + } + + /* Decrement the reference counts of paths referenced by this one. */ + switch(path->pathtype) + { + case T_SeqScan: + case T_IndexScan: + case T_IndexOnlyScan: + /* Simple paths reference no other path. */ + break; + + case T_MergeJoin: + case T_HashJoin: + case T_NestLoop: + { + JoinPath *jpath = (JoinPath *)path; + + unlink_path(&(jpath->outerjoinpath)); + unlink_path(&(jpath->innerjoinpath)); + } + break; + + case T_Append: + case T_MergeAppend: + { + AppendPath *appath = (AppendPath *)path; + ListCell *lc; + + foreach (lc, appath->subpaths) + { + Path *subpath = lfirst(lc); + + /* TODO use a routine to unlink path from list. */ + appath->subpaths = foreach_delete_current(appath->subpaths, lc); + unlink_path(&subpath); + } + } + break; + + case T_Gather: + { + GatherPath *gpath = (GatherPath *) path; + + unlink_path(&(gpath->subpath)); + } + break; + + case T_GatherMerge: + { + GatherMergePath *gmpath = (GatherMergePath *)path; + + unlink_path(&gmpath->subpath); + } + break; + + case T_BitmapHeapScan: + { + BitmapHeapPath *bhpath = (BitmapHeapPath *)path; + + unlink_path(&(bhpath->bitmapqual)); + } + break; + + case T_Material: + { + MaterialPath *mpath = (MaterialPath *)path; + + unlink_path(&(mpath->subpath)); + } + break; + + case T_Memoize: + { + MemoizePath *mpath = (MemoizePath *)path; + + unlink_path(&mpath->subpath); + } + break; + + case T_Result: + { + /* All Result paths except ProjectionPath are simple paths without any subpath. */ + if (IsA(path, ProjectionPath)) + { + ProjectionPath *ppath = (ProjectionPath *) path; + + unlink_path(&ppath->subpath); + } + } + break; + + default: + ereport(WARNING, + (errcode(ERRCODE_INTERNAL_ERROR), + errbacktrace(), + errmsg("unrecognized path type %d", path->pathtype))); + break; + } + + /* + * TODO: add_path does not free IndexPaths, but we do that here. Is there a + * hazard? + */ + + /* Now reclaim the memory. */ + pfree(path); +} /***************************************************************************** * PATH NODE CREATION ROUTINES diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index c17b53f7ad..2881085892 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -1631,6 +1631,7 @@ typedef struct Path /* sort ordering of path's output; a List of PathKey nodes; see above */ List *pathkeys; + int ref_count; } Path; /* Macro for extracting a path's parameterization relids; beware double eval */ diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 001e75b5b7..8df835368d 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -16,6 +16,7 @@ #include "nodes/bitmapset.h" #include "nodes/pathnodes.h" +#include "limits.h" /* @@ -295,6 +296,43 @@ extern Path *reparameterize_path(PlannerInfo *root, Path *path, double loop_count); extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path, RelOptInfo *child_rel); +extern void free_path(Path *path); + +static inline void +link_path(Path **path_link, Path *path) +{ + if (path->ref_count < 0) + ereport(WARNING, + (errcode(ERRCODE_INTERNAL_ERROR), + errbacktrace(), + errmsg("path node %p of type %d has negative reference count %d", + path, path->pathtype, path->ref_count))); + + path->ref_count++; + *path_link = path; + Assert(path->ref_count > 0 && path->ref_count <= INT_MAX); +} + +static inline void +unlink_path(Path **path_link) +{ + Path *path = *path_link; + + /* A path to be unlinked should have been linked before. */ + if (path->ref_count < 0) + ereport(WARNING, + (errcode(ERRCODE_INTERNAL_ERROR), + errbacktrace(), + errmsg("path node %p of type %d had negative reference count %d", + path, path->pathtype, path->ref_count))); + + path->ref_count--; + *path_link = NULL; + + /* Free path if none is referencing it. */ + if (path->ref_count == 0) + free_path(path); +} /* * prototypes for relnode.c -- 2.25.1