Re: [HACKERS] Partition-wise aggregation/grouping

Jeevan Chalke Thu, 01 Feb 2018 06:00:58 -0800

On Thu, Feb 1, 2018 at 1:11 AM, Robert Haas <robertmh...@gmail.com> wrote:


> On Mon, Jan 29, 2018 at 3:42 AM, Jeevan Chalke
> <jeevan.cha...@enterprisedb.com> wrote:
> > Attached new patch set and rebased it on latest HEAD.
>
> I strongly dislike add_single_path_to_append_rel.  It adds branches
> and complexity to code that is already very complex.  Most
> importantly, why are we adding paths to fields in
> OtherUpperPathExtraData *extra instead of adding them to the path list
> of some RelOptInfo?  If we had an appropriate RelOptInfo to which we
> could add the paths, then we could make this simpler.
>
> If I understand correctly, the reason you're doing it this way is
> because we have no place to put partially-aggregated, non-partial
> paths.  If we only needed to worry about the parallel query case, we
> could just build an append of partially-aggregated paths for each
> child and stick it into the grouped rel's partial pathlist, just as we
> already do for regular parallel aggregation.  There's no reason why
> add_paths_to_grouping_rel() needs to care about the difference a
> Partial Aggregate on top of whatever and an Append each branch of
> which is a Partial Aggregate on top of whatever.  However, this won't
> work for non-partial paths, because add_paths_to_grouping_rel() needs
> to put paths into the grouped rel's pathlist -- and we can't mix
> together partially-aggregated paths and fully-aggregated paths in the
> same path list.
>

Yes.


>
> But, really, the way we're using grouped_rel->partial_pathlist right
> now is an awful hack.  What I'm thinking we could do is introduce a
> new UpperRelationKind called UPPERREL_PARTIAL_GROUP_AGG, coming just
> before UPPERREL_GROUP_AGG.  Without partition-wise aggregate,
> partially_grouped_rel's pathlist would always be NIL, and its partial
> pathlist would be constructed using the logic in
> add_partial_paths_to_grouping_rel, which would need renaming.  Then,
> add_paths_to_grouping_rel would use paths from input_rel when doing
> non-parallel aggregation and paths from partially_grouped_rel when
> doing parallel aggregation.  This would eliminate the ugly
> grouped_rel->partial_pathlist = NIL assignment at the bottom of
> create_grouping_paths(), because the grouped_rel's partial_pathlist
> would never have been (bogusly) populated in the first place, and
> hence would not need to be reset.  All of these changes could be made
> via a preparatory patch.
>

I wrote a patch for this (on current HEAD) and attached separately here.
Please have a look.

I still not yet fully understand how we are going to pass those to the
add_paths_to_append_rel(). I need to look it more deeply though.


> Then the main patch needs to worry about four cases:
>
> 1. Parallel partition-wise aggregate, grouping key doesn't contain
> partition key.  This should just be a matter of adding additional
> Append paths to partially_grouped_rel->partial_pathlist.  The existing
> code already knows how to stick a Gather and FinalizeAggregate step on
> top of that, and I don't see why that logic would need any
> modification or addition.  An Append of child partial-grouping paths
> should be producing the same output as a partial grouping of an
> Append, except that the former case might produce more separate groups
> that need to be merged; but that should be OK: we can just throw all
> the paths into the same path list and let the cheapest one win.
>

For any partial aggregation we need to add finalization step after we are
done with the APPEND i.e. post add_paths_to_append_rel(). Given that we
need to replicate the logic of sticking Gather and FinalizeAggregate step
at later stage. This is what exactly done in create_partition_agg_paths().
Am I missing something here?


> 2. Parallel partition-wise aggregate, grouping key contains partition
> key.  For the most part, this is no different from case #1.  We won't
> have groups spanning different partitions in this case, but we might
> have groups spanning different workers, so we still need a
> FinalizeAggregate step.  As an exception, Gather -> Parallel Append ->
> [non-partial Aggregate path] would give us a way of doing aggregation
> in parallel without a separate Finalize step.  I'm not sure if we want
> to consider that to be in scope for this patch.  If we do, then we'd
> add the Parallel Append path to grouped_rel->partial_pathlist.  Then,
> we could stick Gather (Merge) on top if it to produce a path for
> grouped_rel->pathlist using generate_gather_paths(); alternatively, it
> can be used by upper planning steps -- something we currently can't
> ever make work with parallel aggregation.
>
> 3. Non-parallel partition-wise aggregate, grouping key contains
> partition key.  Build Append paths from the children of grouped_rel
> and add them to grouped_rel->pathlist.
>

Yes.


>
> 3. Non-parallel partition-wise aggregate, grouping key doesn't contain
> partition key.  Build Append paths from the children of
> partially_grouped_rel and add them to partially_grouped_rel->pathlist.
> Also add code to generate paths for grouped_rel->pathlist by sticking
> a FinalizeAggregate step on top of each path from
> partially_grouped_rel->pathlist.
>

Yes, this is done in create_partition_agg_paths().
create_partition_agg_paths() basically adds gather path, if required and
then finalizes it again if required. These steps are similar to that of
add_paths_to_grouping_rel() counterpart which does gather + finalization.


> Overall, what I'm trying to argue for here is making this feature look
> less like its own separate thing and more like part of the general
> mechanism we've already got: partial paths would turn into regular
> paths via generate_gather_paths(), and partially aggregated paths
> would turn into fully aggregated paths by adding FinalizeAggregate.
> The existing special case that allows us to build a non-partial, fully
> aggregated path from a partial, partially-aggregated path would be
> preserved.
>
> I think this would probably eliminate some other problems I see in the
> existing design as well.  For example, create_partition_agg_paths
> doesn't consider using Gather Merge, but that might be a win.


Append path is always non-sorted and has no pathkeys. Thus Gather Merge
over an Append path seems infeasible, isn't it?


>   With
> the design above, I think you never need to call create_gather_path()
> anywhere.  In case #1, the existing code takes care of it.  In the
> special case mentioned under #2, if we chose to support that,
> generate_gather_paths() would take care of it.  Both of those places
> already know about Gather Merge.
>

I don't understand how exactly, will have more careful look over this.


> On another note, I found preferFullAgg to be wicked confusing.  To
> "prefer" something is to like it better, but be willing to accept
> other options if the preference can't be accommodated.  Here, it seems
> like preferFullAgg = false prevents consideration of full aggregation.
> So it's really more like allowFullAgg, or, maybe better,
> try_full_aggregation.  Also, try_partition_wise_grouping has a
> variable isPartialAgg which is always ends up getting set to
> !preferFullAgg.  Having two Boolean variables which are always set to
> the opposite of each other isn't good.  To add to the confusion, the
> code following the place where isPartialAgg is set sometimes refers to
> isPartialAgg and sometimes refers to preferFullAgg.
>

I will have a look over this and commenting part.


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Thanks

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2a4e22b..8b8a567 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -186,15 +186,18 @@ static PathTarget *make_sort_input_target(PlannerInfo *root,
 static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 					  List *targets, List *targets_contain_srfs);
 static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
-						  RelOptInfo *grouped_rel, PathTarget *target,
+						  RelOptInfo *grouped_rel,
+						  RelOptInfo *partially_grouped_rel,
+						  PathTarget *target,
 						  PathTarget *partial_grouping_target,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
 						  grouping_sets_data *gd, bool can_sort, bool can_hash,
 						  double dNumGroups, List *havingQual);
-static void add_partial_paths_to_grouping_rel(PlannerInfo *root,
+static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
 								  RelOptInfo *grouped_rel,
+								  RelOptInfo *partial_grouped_rel,
 								  PathTarget *target,
 								  PathTarget *partial_grouping_target,
 								  AggClauseCosts *agg_partial_costs,
@@ -3627,6 +3630,7 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *grouped_rel;
+	RelOptInfo *partially_grouped_rel;
 	PathTarget *partial_grouping_target = NULL;
 	AggClauseCosts agg_partial_costs;	/* parallel only */
 	AggClauseCosts agg_final_costs; /* parallel only */
@@ -3637,6 +3641,8 @@ create_grouping_paths(PlannerInfo *root,
 
 	/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
 	grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
+	partially_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
+											NULL);
 
 	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
@@ -3817,7 +3823,8 @@ create_grouping_paths(PlannerInfo *root,
 								 &agg_final_costs);
 		}
 
-		add_partial_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+		add_paths_to_partial_grouping_rel(root, input_rel, grouped_rel,
+										  partially_grouped_rel, target,
 										  partial_grouping_target,
 										  &agg_partial_costs, &agg_final_costs,
 										  gd, can_sort, can_hash,
@@ -3825,7 +3832,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	/* Build final grouping paths */
-	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
+							  partially_grouped_rel, target,
 							  partial_grouping_target, agg_costs,
 							  &agg_final_costs, gd, can_sort, can_hash,
 							  dNumGroups, (List *) parse->havingQual);
@@ -5859,7 +5867,9 @@ get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
  */
 static void
 add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
-						  RelOptInfo *grouped_rel, PathTarget *target,
+						  RelOptInfo *grouped_rel,
+						  RelOptInfo *partially_grouped_rel,
+						  PathTarget *target,
 						  PathTarget *partial_grouping_target,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
@@ -5870,6 +5880,12 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	ListCell   *lc;
 
+	/*
+	 * Parallel aggregation's partial paths must be stored in a
+	 * partially_grouped_rel and not in a grouped_rel.
+	 */
+	Assert(grouped_rel->partial_pathlist == NIL);
+
 	if (can_sort)
 	{
 		/*
@@ -5945,10 +5961,13 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * Now generate a complete GroupAgg Path atop of the cheapest partial
 		 * path.  We can do this using either Gather or Gather Merge.
 		 */
-		if (grouped_rel->partial_pathlist)
+		if (partially_grouped_rel->partial_pathlist)
 		{
-			Path	   *path = (Path *) linitial(grouped_rel->partial_pathlist);
-			double		total_groups = path->rows * path->parallel_workers;
+			Path	   *path;
+			double		total_groups;
+
+			path = (Path *) linitial(partially_grouped_rel->partial_pathlist);
+			total_groups = path->rows * path->parallel_workers;
 
 			path = (Path *) create_gather_path(root,
 											   grouped_rel,
@@ -5999,7 +6018,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			 */
 			if (parse->groupClause != NIL && root->group_pathkeys != NIL)
 			{
-				foreach(lc, grouped_rel->partial_pathlist)
+				foreach(lc, partially_grouped_rel->partial_pathlist)
 				{
 					Path	   *subpath = (Path *) lfirst(lc);
 					Path	   *gmpath;
@@ -6107,9 +6126,11 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * again, we'll only do this if it looks as though the hash table
 		 * won't exceed work_mem.
 		 */
-		if (grouped_rel->partial_pathlist)
+		if (partially_grouped_rel->partial_pathlist)
 		{
-			Path	   *path = (Path *) linitial(grouped_rel->partial_pathlist);
+			Path	   *path;
+
+			path = (Path *) linitial(partially_grouped_rel->partial_pathlist);
 
 			hashaggtablesize = estimate_hashagg_tablesize(path,
 														  agg_final_costs,
@@ -6143,15 +6164,16 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 }
 
 /*
- * add_partial_paths_to_grouping_rel
+ * add_paths_to_partial_grouping_rel
  *
  * Add partial paths to grouping relation.  These paths are not fully
  * aggregated; a FinalizeAggregate step is still required.
  */
 static void
-add_partial_paths_to_grouping_rel(PlannerInfo *root,
+add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
 								  RelOptInfo *grouped_rel,
+								  RelOptInfo *partially_grouped_rel,
 								  PathTarget *target,
 								  PathTarget *partial_grouping_target,
 								  AggClauseCosts *agg_partial_costs,
@@ -6199,7 +6221,7 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 													 -1.0);
 
 				if (parse->hasAggs)
-					add_partial_path(grouped_rel, (Path *)
+					add_partial_path(partially_grouped_rel, (Path *)
 									 create_agg_path(root,
 													 grouped_rel,
 													 path,
@@ -6211,7 +6233,7 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 													 agg_partial_costs,
 													 dNumPartialGroups));
 				else
-					add_partial_path(grouped_rel, (Path *)
+					add_partial_path(partially_grouped_rel, (Path *)
 									 create_group_path(root,
 													   grouped_rel,
 													   path,
@@ -6239,7 +6261,7 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 		 */
 		if (hashaggtablesize < work_mem * 1024L)
 		{
-			add_partial_path(grouped_rel, (Path *)
+			add_partial_path(partially_grouped_rel, (Path *)
 							 create_agg_path(root,
 											 grouped_rel,
 											 cheapest_partial_path,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6bf68f3..a10b150 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,8 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_PARTIAL_GROUP_AGG,	/* result of partial grouping/aggregation, if
+								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
 	UPPERREL_WINDOW,			/* result of window functions, if any */
 	UPPERREL_DISTINCT,			/* result of "SELECT DISTINCT", if any */

Re: [HACKERS] Partition-wise aggregation/grouping

Reply via email to