On Tue, Oct 18, 2016 at 9:09 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Oct 14, 2016 at 12:37 AM, Ashutosh Bapat
> <ashutosh.ba...@enterprisedb.com> wrote:
>>> Have you tested the effect of this patch on planner memory consumption
>>> with multi-way joins between tables with many partitions?  If you
>>> haven't, you probably should. (Testing runtime would be good, too.)
>>> Does it grow linearly?  Quadratically?  Exponentially?  Minor leaks
>>> don't matter, but if we're generating too much garbage we'll have to
>>> make sure it gets cleaned up soon enough to prevent runaway memory
>>> usage.
>>
>> I tried to check memory usage with various combinations of number of
>> partitions and number of relations being joined. For higher number of
>> relations being joined like 10 with 100 partitions, OOM killer kicked
>> in during the planning phase. I am suspecting
>> adjust_partitionrel_attrs() (changed that name to
>> adjust_join_appendrel_attrs() to be in sync with
>> adjust_appendrel_attrs()) to be the culprit. It copies expression
>> trees every time for joining two children. That's an exponentially
>> increasing number as the number of legal joins increases
>> exponentially. I am still investigating this.
>
> I think the root of this problem is that the existing paths shares a
> lot more substructure than the ones created by the new code.  Without
> a partition-wise join, the incremental memory usage for a joinrel
> isn't any different whether the underlying rel is partitioned or not.
> If it's partitioned, we'll be pointing to an AppendPath; if not, we'll
> be pointing to some kind of Scan.  But the join itself creates exactly
> the same amount of new stuff regardless of what's underneath it.  With
> partitionwise join, that ceases to be true.  Every joinrel - and the
> number of those grows exponentially in the number of baserels, IICU -
> needs its own list of paths for every member rel.  So if a
> non-partition-wise join created X paths, and there are K partitions, a
> partition-wise join creates X * K paths.  That's a lot.
>
> Although we might be able to save some memory by tightening things up
> here and there - for example, right now the planner isn't real smart
> about recycling paths that are evicted by add_path(), and there's
> probably other wastage as well - I suspect that what this shows is
> that the basic design of this patch is not going to be viable.
> Intuitively, it's often going to be the case that we want the "same
> plan" for every partition-set.  That is, if we have A JOIN B ON A.x =
> B.x JOIN C ON A.y = B.y, and if A, B, and C are all compatibility
> partitioned, then the result should be an Append plan with 100 join
> plans under it, and all 100 of those plans should be basically mirror
> images of each other.  Of course, that's not really right in general:
> for example, it could be that A1 is big and A2 is small while B1 is
> small and B2 is big, so that the right plan for (A1 JOIN B1) and for
> (A2 JOIN B2) are totally different from each other.  But in many
> practical cases we'll want to end up with a plan of precisely the same
> shape for all children, and the current design ignores this, expending
> both memory and CPU time to compute essentially-equivalent paths
> across all children.

I think there are going to be two kinds of partitioning use-cases.
First, carefully hand-crafted by DBAs so that every partition is
different from other and so is every join between two partitions.
There will be lesser number of partitions, but creating paths for each
join between partitions will be crucial from performance point of
view. Consider, for example, systems which use partitions to
consolidate results from different sources for analytical purposes or
sharding. If we consider various points you have listed in [1] as to
why a partition is equivalent to a table, each join between partitions
is going to have very different characteristics and thus deserves a
set of paths for its own. Add to that possibility of partition pruning
or certain conditions affecting particular partitions, the need for
detailed planning evident.

The other usage of partitioning is to distribute the data and/or
quickly eliminate the data by partition pruning. In such case, all
partitions of a given table will have very similar properties. There
is a large chance that we will end up having same plans for every
partition and for joins between partitions. In such cases, I think it
suffices to create paths for just one or may be a handful partitions
of join and repeat that plan for other partitions of join. But in such
cases it also makes sense to have a light-weight representation for
partitions as compared to partitions being a full-fledged tables. If
we have such a light-weight representation, we may not even create
RelOptInfos representing joins between partitions, and different paths
for each join between partitions.

>
> One way of attacking this problem is to gang together partitions which
> are equivalent for planning purposes, as discussed in the paper "Join
> Optimization Techniques for Partitioned Tables" by Herodotou, Borisov,
> and Babu.  However, it's not exactly clear how to do this: we could
> gang together partitions that have the same index definitions, but the
> sizes of the heaps, the sizes of their indexes, and the row counts
> will vary from one partition to the next, and any of those things
> could cause the plan choice to be different for one partition vs. the
> next.  We could try to come up with heuristics for when those things
> are likely to be true.  For example, suppose we compute the set of
> partitions such that all joined relations have matching index
> definitions on all tables; then, we take the biggest table in the set
> and consider all tables more than half that size as part of one gang.
> The biggest table becomes the leader and we compute partition-wise
> paths for just that partition; the other members of the gang will
> eventually get a plan that is of the same shape, but we don't actually
> create it that plan until after scan/join planning is concluded.

Section 5 of that paper talks about clustering partitions together for
joining, only when there is 1:m OR n:1 partition matching for join. In
such a case, it clusters all the partitions from one relation that are
all joined with a single partition of the other relation. But I think
your idea to gang up partitions with similar properties may reduce the
number of paths we create but as you have mentioned how to gang them
up is not very clear. There are just too many factors like
availability of the indexes, sizes of tables, size of intermediate
results etc. which make it difficult to identify the properties used
for ganging up. Even after we do that, in the worst case, we will
still end up creating paths for all partitions of all joins, thus
causing increase in paths proportionate to the number of partitions.

In the section 6.3, the paper mentions that the number of paths
retained are linear in the number of child joins per parent join. So,
it's clear that the paper never considered linear increase in the
paths to be a problem or at least a problem that that work had to
solve. Now, it's surprising that their memory usage increased by 7% to
10%. But 1. they might be measuring total memory and not the memory
used by the planner and they experimented with PostgreSQL 8.3.7, which
probably tried much less number of paths than the current optimizer.

>
> Another idea is to try to reduce peak memory usage by performing
> planning separately for each partition-set.  For example, suppose we
> decide to do a partition-wise join of A, B, and C.  Initially, this
> gets represented as a PartitionJoinPath tree, like this:
>
> PartitionJoinPath
> -> AppendPath for A
> -> PartitionJoinPath
>   -> AppendPath for B
>   -> AppendPath for C
>
> Because we haven't created individual join paths for the members, this
> doesn't use much memory.  Somehow, we come up with a cost for the
> PartitionJoinPath; it probably won't be entirely accurate.  Once
> scan/join planning is concluded, if our final path contains a
> PartitionJoinPath, we go back and loop over the partitions.

A typical join tree will be composite: some portion partitioned and
some portion unpartitioned or different portions partitioned by
different partition schemes. In such case, inaccurate costs for
PartitionJoinPath, can affect the plan heavily, causing a suboptimal
path to be picked. Assuming that partitioning will be useful for large
sets of data, choosing a suboptimal plan can be more dangerous than
consuming memory for creating paths.

If we could come up with costs for PartitionJoinPath using some
methods of interpolation, say by sampling few partitions and then
extrapolating their costs for entire PartitionJoinPath, we can use
this method. But unless the partitions have very similar
characteristics or have such characteristics that costs can be guessed
based on the differences between the characteristics, I do not see how
that can happen. For example, while costing a PartitionJoinPath with
pathkeys, the costs will change a lot based on whether underlying
relations have indexes, or which join methods are used, which in turn
is based on properties on the partitions. Same is the case for paths
with parameterization. All such paths are important when a partitioned
join relation joins with other unpartitioned relation or a partitioned
relation with different partitioning scheme.

When each partition of base relation being joined has different
properties, the cost for join between one set of partitions can differ
from join between other set of partitions. Not only that, the costs
for various properties of resultant paths like pathkeys,
parameterization can vary a lot, depending upon the available indexes
and estimates of rows for each join. So, we need to come up with these
cost estimates separately for each join between partitions to come up
with cost of each PartitionJoinPath. If we have to calculate those
costs to create PartitionJoinPath, we better save them in paths rather
than recalculating them in the second round of planning for joins
between partitions.

> For each
> partition, we switch to a new memory context, perform planning, copy
> the best path and its substructure back to the parent context, and
> then reset the context.

This could be rather tricky. It assumes that all the code that creates
paths for joins, should not allocate any memory which is linked to
some object in a context that lives longer than the path creation
context. There is some code like create_join_clause() or
make_canonical_pathkey(), which carefully chooses which memory context
to allocate memory in. But can we ensure it always? postgres_fdw for
example allocates memory for PgFdwRelationInfo in current memory
context and attaches it in RelOptInfo, which should be in the
planner's original context. So, if we create a new memory context for
each partition, fpinfos would be invalidated when those contexts are
released. Not that, we can not enforce some restriction on the memory
usage while planning, it's hard to enforce it and bugs arising from it
may go unnoticed. GEQO planner might have its own problems with this
approach. Third party FDWs will pose a problem.

A possible solution would be to keep the track of used paths using a
reference count. Once the paths for given join tree are created, free
up the unused paths by traversing pathlist in each of the RelOptInfos.
Attached patch has a prototype implementation for the same. There are
some paths which are not linked to RelOptInfos, which need a bit
different treatment, but they can be handled too.

> In that way, peak memory usage only grows by
> about a factor of 2 rather than a factor equal to the partition count,
> because we don't need to keep every possibly-useful path for every
> partition all at the same time, but rather every possibly-useful path
> for a single partition.
>
> Maybe there are other ideas but I have a feeling any way you slice it
> this is going to be a lot of work.

For the case of carefully hand-crafted partitions, I think, users
would expect the planner to use really the best plan and thus may be
willing to accommodate for increased memory usage. Going by any
approach that does not create the paths for joins between partitions
is not guaranteed to give the best plan. Users willing to provide
increased memory will be unhappy if we do not give them the best path.

The user who creates hundreds of partitions, will ideally be using
pretty powerful servers with a lot of memory. On such servers, the
linear increase in memory for paths may not be as bad as you are
portraying above, as long as its producing the best plan.

Just joining partitioned tables with hundreds of partitions does not
increase the number of paths. Number of paths increases when two
partitioned tables with similar partitioning scheme are joined with
equality condition on partition key. Unless we consider
repartitioning, how many of the joining relations share same
partitioning scheme? Section 8.6 mentions, "no TPC-H query plan,
regardless of the partitioning scheme, contains n-way child joins for
n >= 4. Maximum partitions that the paper mentions is 168 (Table 3).
My VM which has 8GB RAM and 4 cores handled that case pretty well. We
may add logic to free up space used by useless paths post-join to free
up some memory for next stages of query execution.

There will still be users, for whom the increase in the memory usage
is unexpected. Those will need to be educated or for them we might
take heuristic PartitionJoinPath based approach discussed above. But I
don't think that heuristic approach should be the default case. May be
we should supply a GUC which can switch between the approaches.

Some ideas for GUCs are 1. delay_partition_wise_join - when ON uses
the heuristic approach of PartitionJoinPath.
2. A GUC similar to join_collapse_limit may be used to limit the
number of partitioned relations being joined using partition-wise join
technique. A value of 1, indicates enable_partition_wise_join = false.
So, we may replace enable_partition_wise_join withe this GUC.
3. A GUC max_joinable_partitions (open to suggestions for name) may
specify the maximum number of partitions that two relations may have
to be eligible for partition-wise join.

I guess, using these GUCs allows a user handle the trade-off between
getting the best plan and memory usage consciously. I think, users
would like to accept a suboptimal plans consciously than being thrown
a suboptimal plan without choice.

[1] 
http://postgresql.nabble.com/design-for-a-partitioning-feature-was-inheritance-td5921603.html

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e42ef98..6a730ca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2200,20 +2200,75 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
 
 #ifdef OPTIMIZER_DEBUG
 			debug_print_rel(root, rel);
 #endif
 		}
 	}
 
+	/* Release all the memory consumed by unreferenced paths. */
+	for (lev = levels_needed - 1; lev >= 1; lev--)
+	{
+		ListCell *lc;
+		foreach (lc, root->join_rel_level[lev])
+		{
+			ListCell	*lc_path;
+			RelOptInfo *rel = (RelOptInfo *) lfirst(lc);
+
+			elog(NOTICE, "path list length before deleting %d", list_length(rel->pathlist));
+
+			lc_path = list_head(rel->pathlist);
+			while(lc_path)
+			{
+				Path *path = (Path *) lfirst(lc_path);
+
+				lc_path = lnext(lc_path);
+
+				/* Free the path if none references it. */
+				if (path->num_refs == 1)
+				{
+					elog(NOTICE, "freed unreferenced path %p of type %d", path,
+						 path->pathtype);
+
+					rel->pathlist = list_delete_ptr(rel->pathlist, path);
+					free_path(path);
+				}
+			}
+
+			elog(NOTICE, "path list length after deleting %d", list_length(rel->pathlist));
+
+			elog(NOTICE, "partial path list length before deleting %d", list_length(rel->partial_pathlist));
+			/* Do the same for partial pathlist. */
+			lc_path = list_head(rel->partial_pathlist);
+			while(lc_path)
+			{
+				Path *path = (Path *) lfirst(lc_path);
+
+				lc_path = lnext(lc_path);
+
+				/* Free the path if none references it. */
+				if (path->num_refs == 1)
+				{
+					elog(NOTICE, "freed unreferenced path %p of type %d", path,
+						 path->pathtype);
+
+					rel->partial_pathlist = list_delete_ptr(rel->partial_pathlist, path);
+					free_path(path);
+				}
+			}
+
+			elog(NOTICE, "partial path list length after deleting %d", list_length(rel->partial_pathlist));
+		}
+	}
+
 	/*
 	 * We should have a single rel at the final level.
 	 */
 	if (root->join_rel_level[levels_needed] == NIL)
 		elog(ERROR, "failed to build any %d-way joins", levels_needed);
 	Assert(list_length(root->join_rel_level[levels_needed]) == 1);
 
 	rel = (RelOptInfo *) linitial(root->join_rel_level[levels_needed]);
 
 	root->join_rel_level = NULL;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index abb7507..6b34f6a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -39,21 +39,21 @@ typedef enum
 } PathCostComparison;
 
 /*
  * STD_FUZZ_FACTOR is the normal fuzz factor for compare_path_costs_fuzzily.
  * XXX is it worth making this user-controllable?  It provides a tradeoff
  * between planner runtime and the accuracy of path cost comparisons.
  */
 #define STD_FUZZ_FACTOR 1.01
 
 static List *translate_sub_tlist(List *tlist, int relid);
-
+static void unref_path(Path *path);
 
 /*****************************************************************************
  *		MISC. PATH UTILITIES
  *****************************************************************************/
 
 /*
  * compare_path_costs
  *	  Return -1, 0, or +1 according as path1 is cheaper, the same cost,
  *	  or more expensive than path2 for the specified criterion.
  */
@@ -581,22 +581,21 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 		 * Remove current element from pathlist if dominated by new.
 		 */
 		if (remove_old)
 		{
 			parent_rel->pathlist = list_delete_cell(parent_rel->pathlist,
 													p1, p1_prev);
 
 			/*
 			 * Delete the data pointed-to by the deleted cell, if possible
 			 */
-			if (!IsA(old_path, IndexPath))
-				pfree(old_path);
+			free_path(old_path);
 			/* p1_prev does not advance */
 		}
 		else
 		{
 			/* new belongs after this old path if it has cost >= old's */
 			if (new_path->total_cost >= old_path->total_cost)
 				insert_after = p1;
 			/* p1_prev advances */
 			p1_prev = p1;
 		}
@@ -610,26 +609,26 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 			break;
 	}
 
 	if (accept_new)
 	{
 		/* Accept the new path: insert it at proper place in pathlist */
 		if (insert_after)
 			lappend_cell(parent_rel->pathlist, insert_after, new_path);
 		else
 			parent_rel->pathlist = lcons(new_path, parent_rel->pathlist);
+		new_path->num_refs++;
 	}
 	else
 	{
 		/* Reject and recycle the new path */
-		if (!IsA(new_path, IndexPath))
-			pfree(new_path);
+		free_path(new_path);
 	}
 }
 
 /*
  * add_path_precheck
  *	  Check whether a proposed new path could possibly get accepted.
  *	  We assume we know the path's pathkeys and parameterization accurately,
  *	  and have lower bounds for its costs.
  *
  * Note that we do not know the path's rowcount, since getting an estimate for
@@ -821,21 +820,21 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 
 		/*
 		 * Remove current element from partial_pathlist if dominated by new.
 		 */
 		if (remove_old)
 		{
 			parent_rel->partial_pathlist =
 				list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
 			/* we should not see IndexPaths here, so always safe to delete */
 			Assert(!IsA(old_path, IndexPath));
-			pfree(old_path);
+			free_path(old_path);
 			/* p1_prev does not advance */
 		}
 		else
 		{
 			/* new belongs after this old path if it has cost >= old's */
 			if (new_path->total_cost >= old_path->total_cost)
 				insert_after = p1;
 			/* p1_prev advances */
 			p1_prev = p1;
 		}
@@ -850,27 +849,28 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
 	}
 
 	if (accept_new)
 	{
 		/* Accept the new path: insert it at proper place */
 		if (insert_after)
 			lappend_cell(parent_rel->partial_pathlist, insert_after, new_path);
 		else
 			parent_rel->partial_pathlist =
 				lcons(new_path, parent_rel->partial_pathlist);
+		new_path->num_refs++;
 	}
 	else
 	{
 		/* we should not see IndexPaths here, so always safe to delete */
 		Assert(!IsA(new_path, IndexPath));
 		/* Reject and recycle the new path */
-		pfree(new_path);
+		free_path(new_path);
 	}
 }
 
 /*
  * add_partial_path_precheck
  *	  Check whether a proposed new partial path could possibly get accepted.
  *
  * Unlike add_path_precheck, we can ignore startup cost and parameterization,
  * since they don't matter for partial paths (see add_partial_path).  But
  * we do want to make sure we don't add a partial path if there's already
@@ -1079,20 +1079,21 @@ create_bitmap_heap_path(PlannerInfo *root,
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = rel->reltarget;
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel;
 	pathnode->path.parallel_workers = 0;
 	pathnode->path.pathkeys = NIL;		/* always unordered */
 
 	pathnode->bitmapqual = bitmapqual;
+	bitmapqual->num_refs++;
 
 	cost_bitmap_heap_scan(&pathnode->path, root, rel,
 						  pathnode->path.param_info,
 						  bitmapqual, loop_count);
 
 	return pathnode;
 }
 
 /*
  * create_bitmap_and_path
@@ -1228,20 +1229,22 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	 * but since it doesn't do any selection or projection, it is a pretty
 	 * cheap node.
 	 */
 	pathnode->path.rows = 0;
 	pathnode->path.startup_cost = 0;
 	pathnode->path.total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		subpath->num_refs++;
+
 		pathnode->path.rows += subpath->rows;
 
 		if (l == list_head(subpaths))	/* first node? */
 			pathnode->path.startup_cost = subpath->startup_cost;
 		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
@@ -1290,20 +1293,21 @@ create_merge_append_path(PlannerInfo *root,
 	/*
 	 * Add up the sizes and costs of the input paths.
 	 */
 	pathnode->path.rows = 0;
 	input_startup_cost = 0;
 	input_total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
+		subpath->num_refs++;
 		pathnode->path.rows += subpath->rows;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
 		if (pathkeys_contained_in(pathkeys, subpath->pathkeys))
 		{
 			/* Subpath is adequately ordered, we won't need to sort it */
 			input_startup_cost += subpath->startup_cost;
 			input_total_cost += subpath->total_cost;
 		}
@@ -1678,20 +1682,21 @@ create_gather_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = target;
 	pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
 														  required_outer);
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = false;
 	pathnode->path.parallel_workers = subpath->parallel_workers;
 	pathnode->path.pathkeys = NIL;		/* Gather has unordered result */
 
 	pathnode->subpath = subpath;
+	subpath->num_refs++;
 	pathnode->single_copy = false;
 
 	if (pathnode->path.parallel_workers == 0)
 	{
 		pathnode->path.parallel_workers = 1;
 		pathnode->path.pathkeys = subpath->pathkeys;
 		pathnode->single_copy = true;
 	}
 
 	cost_gather(pathnode, root, rel, pathnode->path.param_info, rows);
@@ -1999,21 +2004,23 @@ create_nestloop_path(PlannerInfo *root,
 								  required_outer,
 								  &restrict_clauses);
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = joinrel->consider_parallel &&
 		outer_path->parallel_safe && inner_path->parallel_safe;
 	/* This is a foolish way to estimate parallel_workers, but for now... */
 	pathnode->path.parallel_workers = outer_path->parallel_workers;
 	pathnode->path.pathkeys = pathkeys;
 	pathnode->jointype = jointype;
 	pathnode->outerjoinpath = outer_path;
+	outer_path->num_refs++;
 	pathnode->innerjoinpath = inner_path;
+	inner_path->num_refs++;
 	pathnode->joinrestrictinfo = restrict_clauses;
 
 	final_cost_nestloop(root, pathnode, workspace, sjinfo, semifactors);
 
 	return pathnode;
 }
 
 /*
  * create_mergejoin_path
  *	  Creates a pathnode corresponding to a mergejoin join between
@@ -2062,21 +2069,23 @@ create_mergejoin_path(PlannerInfo *root,
 								  required_outer,
 								  &restrict_clauses);
 	pathnode->jpath.path.parallel_aware = false;
 	pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
 		outer_path->parallel_safe && inner_path->parallel_safe;
 	/* This is a foolish way to estimate parallel_workers, but for now... */
 	pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
 	pathnode->jpath.path.pathkeys = pathkeys;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.outerjoinpath = outer_path;
+	outer_path->num_refs++;
 	pathnode->jpath.innerjoinpath = inner_path;
+	inner_path->num_refs++;
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_mergeclauses = mergeclauses;
 	pathnode->outersortkeys = outersortkeys;
 	pathnode->innersortkeys = innersortkeys;
 	/* pathnode->materialize_inner will be set by final_cost_mergejoin */
 
 	final_cost_mergejoin(root, pathnode, workspace, sjinfo);
 
 	return pathnode;
 }
@@ -2136,21 +2145,23 @@ create_hashjoin_path(PlannerInfo *root,
 	 * and then we could assume that the output inherits the outer relation's
 	 * ordering, which might save a sort step.  However there is considerable
 	 * downside if our estimate of the inner relation size is badly off. For
 	 * the moment we don't risk it.  (Note also that if we wanted to take this
 	 * seriously, joinpath.c would have to consider many more paths for the
 	 * outer rel than it does now.)
 	 */
 	pathnode->jpath.path.pathkeys = NIL;
 	pathnode->jpath.jointype = jointype;
 	pathnode->jpath.outerjoinpath = outer_path;
+	outer_path->num_refs++;
 	pathnode->jpath.innerjoinpath = inner_path;
+	inner_path->num_refs++;
 	pathnode->jpath.joinrestrictinfo = restrict_clauses;
 	pathnode->path_hashclauses = hashclauses;
 	/* final_cost_hashjoin will fill in pathnode->num_batches */
 
 	final_cost_hashjoin(root, pathnode, workspace, sjinfo, semifactors);
 
 	return pathnode;
 }
 
 /*
@@ -3202,10 +3213,87 @@ reparameterize_path(PlannerInfo *root, Path *path,
 														 rel,
 														 spath->subpath,
 														 spath->path.pathkeys,
 														 required_outer);
 			}
 		default:
 			break;
 	}
 	return NULL;
 }
+
+void
+free_path(Path *path)
+{
+	/* Decrement the reference counts of paths referenced by this one. */
+	switch(path->pathtype)
+	{
+		case T_SeqScan:
+		case T_IndexScan:
+		case T_IndexOnlyScan:
+			/* Simple paths do nothing. */
+			break;
+
+		case T_MergeJoin:
+		case T_HashJoin:
+		case T_NestLoop:
+			{
+				JoinPath *jpath = (JoinPath *)path;
+				unref_path(jpath->outerjoinpath);
+				unref_path(jpath->innerjoinpath);
+			}
+			break;
+
+		case T_Append:
+		case T_MergeAppend:
+			{
+				AppendPath *appath = (AppendPath *)path;
+				ListCell   *lc;
+
+				foreach (lc, appath->subpaths)
+				{
+					Path *path = lfirst(lc);
+					unref_path(path);
+				}
+			}
+			break;
+
+		case T_Gather:
+			{
+				GatherPath *gpath = (GatherPath *) path;
+				unref_path(gpath->subpath);
+			}
+			break;
+
+		case T_BitmapHeapScan:
+			{
+				BitmapHeapPath *bhpath = (BitmapHeapPath *)path;
+				unref_path(bhpath->bitmapqual);
+			}
+			break;
+
+		default:
+			elog(ERROR, "unrecognized path type %d", path->pathtype);
+			break;
+	}
+
+	/* Now reclaim the memory. */
+	if (!IsA(path, IndexPath))
+		pfree(path);
+}
+
+static void
+unref_path(Path *path)
+{
+	if (!path)
+		return;
+
+	path->num_refs--;
+
+	if (path->num_refs == 0)
+	{
+		elog(NOTICE, "freed an unreferenced path %p of type %d not added to rel pathlist",
+			 path, path->pathtype);
+
+		free_path(path);
+	}
+}
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3a1255a..a58d71b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -891,20 +891,21 @@ typedef struct Path
 	int			parallel_workers;		/* desired # of workers; 0 = not
 										 * parallel */
 
 	/* estimated size/costs for path (see costsize.c for more info) */
 	double		rows;			/* estimated number of result tuples */
 	Cost		startup_cost;	/* cost expended before fetching any tuples */
 	Cost		total_cost;		/* total cost (assuming all tuples fetched) */
 
 	List	   *pathkeys;		/* sort ordering of path's output */
 	/* pathkeys is a List of PathKey nodes; see above */
+	int			num_refs;		/* Number of objects referencing this path. */
 } Path;
 
 /* Macro for extracting a path's parameterization relids; beware double eval */
 #define PATH_REQ_OUTER(path)  \
 	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
 
 /*----------
  * IndexPath represents an index scan over a single index.
  *
  * This struct is used for both regular indexscans and index-only scans;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 44abe83..d658cd3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -211,12 +211,13 @@ extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
 extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first);
+extern void free_path(Path *path);
 
 #endif   /* PATHS_H */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to