Re: [HACKERS] Partition-wise aggregation/grouping

Ashutosh Bapat Fri, 01 Dec 2017 04:42:02 -0800

Continuing with review of 0007.

+
+    /* Copy input rels's relids to grouped rel */
+    grouped_rel->relids = input_rel->relids;


Isn't this done in fetch_upper_rel()? Why do we need it here?
There's also a similar hunk in create_grouping_paths() which doesn't look
appropriate. I guess, you need relids in grouped_rel->relids for FDW. There are
two ways to do this: 1. set grouped_rel->relids for parent upper rel as well,
but then we should pass relids to fetch_upper_rel() instead of setting those
later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids to
all_baserels, if upper_rel->relids is NULL and don't set relids for a parent
upper rel. I am fine with either of those.

+            /* partial phase */
+            get_agg_clause_costs(root, (Node *) partial_target->exprs,
+                                 AGGSPLIT_INITIAL_SERIAL,
+                                 &agg_partial_costs);

IIUC, the costs for evaluating aggregates would not change per child. They
won't be different for parent and any of the children. So, we should be able to
calculate those once, save in "extra" and use repeatedly.

+        if (can_sort)
+        {
+            Path       *path = cheapest_path;
+
+            if (!(pathkeys_contained_in(root->group_pathkeys,
+                                        path->pathkeys)))
[ .. clipped patch .. ]
+                                           NIL,
+                                           dNumGroups));
+        }

We create two kinds of paths partial paths for parallel query and partial
aggregation paths when group keys do not have partition keys. The comments and
code uses partial to mean both the things, which is rather confusing. May be we
should use term "partial aggregation" explicitly wherever it means that in
comments and in variable names.

I still feel that create_grouping_paths() and create_child_grouping_paths()
have a lot of common code. While I can see that there are some pockets in
create_grouping_paths() which are not required in create_child_grouping_paths()
and vice-versa, may be we should create only one function and move those
pockets under "if (child)" or "if (parent)" kind of conditions. It will be a
maintenance burden to keep these two functions in sync in future. If we do not
keep them in sync, that will introduce bugs.

+
+/*
+ * create_partition_agg_paths
+ *
+ * Creates append path for all the partitions and adds into the grouped rel.

I think you want to write "Creates an append path containing paths from all the
child grouped rels and adds into the given parent grouped rel".

+ * For partial mode we need to add a finalize agg node over append path before
+ * adding a path to grouped rel.
+ */
+void
+create_partition_agg_paths(PlannerInfo *root,
+                           RelOptInfo *grouped_rel,
+                           List *subpaths,
+                           List *partial_subpaths,

Why do we have these two as separate arguments? I don't see any call to
create_partition_agg_paths() through add_append_path() passing both of them
non-NULL simultaneously. May be you want use a single list subpaths and another
boolean indicating whether it's list of partial paths or regular paths.

+
+    /* For non-partial path, just create a append path and we are done. */
This is the kind of confusion, I am talking about above. Here you have
mentioned "non-partial path" which may mean a regular path but what you
actually mean by that term is a path representing partial aggregates.

+    /*
+     * Partial partition-wise grouping paths.  Need to create final
+     * aggregation path over append relation.
+     *
+     * If there are partial subpaths, then we need to add gather path before we
+     * append these subpaths.

More confusion here.

+     */
+    if (partial_subpaths)
+    {
+        ListCell   *lc;
+
+        Assert(subpaths == NIL);
+
+        foreach(lc, partial_subpaths)
+        {
+            Path       *path = lfirst(lc);
+            double        total_groups = path->rows * path->parallel_workers;
+
+            /* Create gather paths for partial subpaths */
+            Path *gpath = (Path *) create_gather_path(root, grouped_rel, path,
+                                                      path->pathtarget, NULL,
+                                                      &total_groups);
+
+            subpaths = lappend(subpaths, gpath);

Using the argument variable is confusing and that's why you need two different
List variables. Instead probably you could have another variable local to this
function to hold the gather subpaths.

AFAIU, the Gather paths that this code creates has its parent set to
parent grouped
rel. That's not correct. These partial paths come from children of grouped rel
and each gather is producing rows corresponding to one children of grouped rel.
So gather path's parent should be set to corresponding child and not parent
grouped rel.

This code creates plans where there are multiple Gather nodes under an Append
node. AFAIU, the workers assigned to one gather node can be reused until that
Gather node finishes. Having multiple Gather nodes under an Append mean that
every worker will be idle from the time that worker finishes the work till the
last worker finishes the work. That doesn't seem to be optimal use of workers.
The plan that we create with Gather on top of Append seems to be better. So, we
should avoid creating one Gather node per child plans. Have we tried to compare
performance of these two plans?

+        if (!IsA(apath, MergeAppendPath) && root->group_pathkeys)
+        {
+            spath = (Path *) create_sort_path(root,
+                                              grouped_rel,
+                                              apath,
+                                              root->group_pathkeys,
+                                              -1.0);
+        }

The code here assumes that a MergeAppend path will always have pathkeys
matching group_pathkeys. I believe that's true but probably we should have an
Assert to make it clear and add comments. If that's not true, we will need to
sort the output of MergeAppend OR discard MergeAppend paths which do not have
pathkeys matching group_pathkeys.


diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b422050..1941468 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ UnlistenStmt
 UnresolvedTup
 UnresolvedTupData
 UpdateStmt
+UpperPathExtraData
 UpperRelationKind
 UpperUniquePath
 UserAuth

Do we commit this file as part of the feature?
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

Reply via email to