Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra Tue, 09 Jul 2019 05:11:49 -0700

On Tue, Jul 09, 2019 at 03:37:03AM +0200, Tomas Vondra wrote:


...

Notice that cost of the second plan is almost double the first one. That
means 0004 does not even generate the first plan, i.e. there are cases
where we don't try to add the explicit sort before passing the path to
generate_gather_paths().

And I think I know why is that - while gather_grouping_paths() tries to
add explicit sort below the gather merge, there are other places that
call generate_gather_paths() that don't do that. In this case it's
probably apply_scanjoin_target_to_paths() which simply builds

 parallel (seq|index) scan + gather merge

and that's it. The problem is likely the same - the code does not know
which pathkeys are "interesting" at that point. We probably need to
teach planner to do this.


I've looked into this again, and yes - that's the reason. I've added
generate_useful_gather_paths() which is a wrapper on top of
generate_gather_paths(). It essentially does what 0001 patch did directly
in generate_gather_paths() so it's more like create_grouping_paths().

And that does the trick - we now generate the cheaper paths, and I don't
see any crashes in regression tests etc.

I still suspect we already have code doing similar checks whether pathkeys
might be useful somewhere. I've looked into pathkeys.c and elsewhere but
no luck.

Attached is a slightly modified patch series:

1) 0001 considers incremental sort paths in various places (adds the new
generate_useful_gather_paths and modifies places calling create_sort_path)

2) 0002 and 0003 are fixes I mentioned before

3) 0004 adds a new GUC force_incremental_sort that (when set to 'on')
tries to nudge the optimizer into using incremental sort by essentially
making it free (i.e. using startup/total costs of the subpath). I've found
this useful when trying to force incremental sorts into plans where it may
not be the best strategy.

I won't have time to hack on this over the next ~2 weeks, but I'll try to
respond to questions when possible.


FWIW tweaking all the create_sort_path() places to also consider adding
incremental sort is a bit tedious and invasive, and it almost doubles
the amount of repetitive code. It's OK for experiment like this, but we
should try handling this in a nicer way (move to a separate function
that does both, or something like that).


This definitely needs more work. We need to refactor it in some way, e.g.
have a function that would consider both explicit sort (on the cheapest
path) and incremental sort (on all paths), and call it from all those
places. Haven't tried it, though.

There's also a couple more places where we do create_sort_path() and don't
consider incremental sort yet - window functions, distinct etc.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>From e7e97daf447a91e090809be4f07a5eee650eb5e7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Tue, 9 Jul 2019 00:12:45 +0200
Subject: [PATCH 1/4] fix pathkey processing in generate_gather_paths

---
 src/backend/optimizer/path/allpaths.c   | 365 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  10 +-
 src/backend/optimizer/plan/planner.c    | 301 ++++++++++++++++++-
 src/include/optimizer/paths.h           |   2 +
 4 files changed, 673 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 3efc807164..acddbef064 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -556,7 +556,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
         */
        if (rel->reloptkind == RELOPT_BASEREL &&
                bms_membership(root->all_baserels) != BMS_SINGLETON)
-               generate_gather_paths(root, rel, false);
+               generate_useful_gather_paths(root, rel, false);
 
        /* Now find the cheapest of the paths for this rel */
        set_cheapest(rel);
@@ -2730,6 +2730,367 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo 
*rel, bool override_rows)
        }
 }
 
+/*
+ * Find an equivalence class member expression, all of whose Vars, come from
+ * the indicated relation.
+ */
+static Expr *
+find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+       ListCell   *lc_em;
+
+       foreach(lc_em, ec->ec_members)
+       {
+               EquivalenceMember *em = lfirst(lc_em);
+
+               if (bms_is_subset(em->em_relids, rel->relids) &&
+                       !bms_is_empty(em->em_relids))
+               {
+                       /*
+                        * If there is more than one equivalence member whose 
Vars are
+                        * taken entirely from this relation, we'll be content 
to choose
+                        * any one of those.
+                        */
+                       return em->em_expr;
+               }
+       }
+
+       /* We didn't find any suitable equivalence class expression */
+       return NULL;
+}
+
+/*
+ * get_useful_ecs_for_relation
+ *             Determine which EquivalenceClasses might be involved in useful
+ *             orderings of this relation.
+ *
+ * This function is in some respects a mirror image of the core function
+ * pathkeys_useful_for_merging: for a regular table, we know what indexes
+ * we have and want to test whether any of them are useful.  For a foreign
+ * table, we don't know what indexes are present on the remote side but
+ * want to speculate about which ones we'd like to use if they existed.
+ *
+ * This function returns a list of potentially-useful equivalence classes,
+ * but it does not guarantee that an EquivalenceMember exists which contains
+ * Vars only from the given relation.  For example, given ft1 JOIN t1 ON
+ * ft1.x + t1.x = 0, this function will say that the equivalence class
+ * containing ft1.x + t1.x is potentially useful.  Supposing ft1 is remote and
+ * t1 is local (or on a different server), it will turn out that no useful
+ * ORDER BY clause can be generated.  It's not our job to figure that out
+ * here; we're only interested in identifying relevant ECs.
+ */
+static List *
+get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+       List       *useful_eclass_list = NIL;
+       ListCell   *lc;
+       Relids          relids;
+
+       /*
+        * First, consider whether any active EC is potentially useful for a 
merge
+        * join against this relation.
+        */
+       if (rel->has_eclass_joins)
+       {
+               foreach(lc, root->eq_classes)
+               {
+                       EquivalenceClass *cur_ec = (EquivalenceClass *) 
lfirst(lc);
+
+                       if (eclass_useful_for_merging(root, cur_ec, rel))
+                               useful_eclass_list = 
lappend(useful_eclass_list, cur_ec);
+               }
+       }
+
+       /*
+        * Next, consider whether there are any non-EC derivable join clauses 
that
+        * are merge-joinable.  If the joininfo list is empty, we can exit
+        * quickly.
+        */
+       if (rel->joininfo == NIL)
+               return useful_eclass_list;
+
+       /* If this is a child rel, we must use the topmost parent rel to 
search. */
+       if (IS_OTHER_REL(rel))
+       {
+               Assert(!bms_is_empty(rel->top_parent_relids));
+               relids = rel->top_parent_relids;
+       }
+       else
+               relids = rel->relids;
+
+       /* Check each join clause in turn. */
+       foreach(lc, rel->joininfo)
+       {
+               RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+               /* Consider only mergejoinable clauses */
+               if (restrictinfo->mergeopfamilies == NIL)
+                       continue;
+
+               /* Make sure we've got canonical ECs. */
+               update_mergeclause_eclasses(root, restrictinfo);
+
+               /*
+                * restrictinfo->mergeopfamilies != NIL is sufficient to 
guarantee
+                * that left_ec and right_ec will be initialized, per comments 
in
+                * distribute_qual_to_rels.
+                *
+                * We want to identify which side of this merge-joinable clause
+                * contains columns from the relation produced by this 
RelOptInfo. We
+                * test for overlap, not containment, because there could be 
extra
+                * relations on either side.  For example, suppose we've got 
something
+                * like ((A JOIN B ON A.x = B.x) JOIN C ON A.y = C.y) LEFT JOIN 
D ON
+                * A.y = D.y.  The input rel might be the joinrel between A and 
B, and
+                * we'll consider the join clause A.y = D.y. relids contains a
+                * relation not involved in the join class (B) and the 
equivalence
+                * class for the left-hand side of the clause contains a 
relation not
+                * involved in the input rel (C).  Despite the fact that we 
have only
+                * overlap and not containment in either direction, A.y is 
potentially
+                * useful as a sort column.
+                *
+                * Note that it's even possible that relids overlaps neither 
side of
+                * the join clause.  For example, consider A LEFT JOIN B ON A.x 
= B.x
+                * AND A.x = 1.  The clause A.x = 1 will appear in B's joininfo 
list,
+                * but overlaps neither side of B.  In that case, we just skip 
this
+                * join clause, since it doesn't suggest a useful sort order 
for this
+                * relation.
+                */
+               if (bms_overlap(relids, restrictinfo->right_ec->ec_relids))
+                       useful_eclass_list = 
list_append_unique_ptr(useful_eclass_list,
+                                                                               
                                restrictinfo->right_ec);
+               else if (bms_overlap(relids, restrictinfo->left_ec->ec_relids))
+                       useful_eclass_list = 
list_append_unique_ptr(useful_eclass_list,
+                                                                               
                                restrictinfo->left_ec);
+       }
+
+       return useful_eclass_list;
+}
+
+/*
+ * get_useful_pathkeys_for_relation
+ *             Determine which orderings of a relation might be useful.
+ *
+ * Getting data in sorted order can be useful either because the requested
+ * order matches the final output ordering for the overall query we're
+ * planning, or because it enables an efficient merge join.  Here, we try
+ * to figure out which pathkeys to consider.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+       List       *useful_pathkeys_list = NIL;
+       List       *useful_eclass_list;
+       EquivalenceClass *query_ec = NULL;
+       ListCell   *lc;
+
+       /*
+        * Pushing the query_pathkeys to the remote server is always worth
+        * considering, because it might let us avoid a local sort.
+        */
+       if (root->query_pathkeys)
+       {
+               bool            query_pathkeys_ok = true;
+
+               foreach(lc, root->query_pathkeys)
+               {
+                       PathKey    *pathkey = (PathKey *) lfirst(lc);
+                       EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+                       Expr       *em_expr;
+
+                       /*
+                        * The planner and executor don't have any clever 
strategy for
+                        * taking data sorted by a prefix of the query's 
pathkeys and
+                        * getting it to be sorted by all of those pathkeys. 
We'll just
+                        * end up resorting the entire data set.  So, unless we 
can push
+                        * down all of the query pathkeys, forget it.
+                        *
+                        * is_foreign_expr would detect volatile expressions as 
well, but
+                        * checking ec_has_volatile here saves some cycles.
+                        */
+                       if (pathkey_ec->ec_has_volatile ||
+                               !(em_expr = find_em_expr_for_rel(pathkey_ec, 
rel)))
+                       {
+                               query_pathkeys_ok = false;
+                               break;
+                       }
+               }
+
+               if (query_pathkeys_ok)
+                       useful_pathkeys_list = 
list_make1(list_copy(root->query_pathkeys));
+       }
+
+       /* Get the list of interesting EquivalenceClasses. */
+       useful_eclass_list = get_useful_ecs_for_relation(root, rel);
+
+       /* Extract unique EC for query, if any, so we don't consider it again. 
*/
+       if (list_length(root->query_pathkeys) == 1)
+       {
+               PathKey    *query_pathkey = linitial(root->query_pathkeys);
+
+               query_ec = query_pathkey->pk_eclass;
+       }
+
+       /*
+        * As a heuristic, the only pathkeys we consider here are those of 
length
+        * one.  It's surely possible to consider more, but since each one we
+        * choose to consider will generate a round-trip to the remote side, we
+        * need to be a bit cautious here.  It would sure be nice to have a 
local
+        * cache of information about remote index definitions...
+        */
+       foreach(lc, useful_eclass_list)
+       {
+               EquivalenceClass *cur_ec = lfirst(lc);
+               Expr       *em_expr;
+               PathKey    *pathkey;
+
+               /* If redundant with what we did above, skip it. */
+               if (cur_ec == query_ec)
+                       continue;
+
+               /* If no pushable expression for this rel, skip it. */
+               em_expr = find_em_expr_for_rel(cur_ec, rel);
+               if (em_expr == NULL)
+                       continue;
+
+               /* Looks like we can generate a pathkey, so let's do it. */
+               pathkey = make_canonical_pathkey(root, cur_ec,
+                                                                               
 linitial_oid(cur_ec->ec_opfamilies),
+                                                                               
 BTLessStrategyNumber,
+                                                                               
 false);
+               useful_pathkeys_list = lappend(useful_pathkeys_list,
+                                                                          
list_make1(pathkey));
+       }
+
+       return useful_pathkeys_list;
+}
+
+/*
+ * generate_useful_gather_paths
+ *             Generate parallel access paths for a relation by pushing a 
Gather or
+ *             Gather Merge on top of a partial path.
+ *
+ * Unlike generate_gather_paths, this does not look just as pathkeys of the
+ * input paths (aiming to preserve the ordering). It also considers ordering
+ * that might be useful by nodes above the gather merge node, and tries to
+ * add a sort (regular or incremental) to provide that.
+ */
+void
+generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool 
override_rows)
+{
+       ListCell   *lc;
+       double          rows;
+       double     *rowsp = NULL;
+       List       *useful_pathkeys_list = NIL;
+       Path       *cheapest_partial_path = NULL;
+
+       /* If there are no partial paths, there's nothing to do here. */
+       if (rel->partial_pathlist == NIL)
+               return;
+
+       /* Should we override the rel's rowcount estimate? */
+       if (override_rows)
+               rowsp = &rows;
+
+       /* generate the regular gather merge paths */
+       generate_gather_paths(root, rel, override_rows);
+
+       /* consider incremental sort for interesting orderings */
+       useful_pathkeys_list = get_useful_pathkeys_for_relation(root, rel);
+
+       /* used for explicit sort paths */
+       cheapest_partial_path = linitial(rel->partial_pathlist);
+
+       /*
+        * Consider incremental sort paths for each interesting ordering.
+        *
+        * XXX I wonder if we need to consider adding a projection here, as
+        * create_ordered_paths does.
+        */
+       foreach(lc, useful_pathkeys_list)
+       {
+               List       *useful_pathkeys = lfirst(lc);
+               ListCell   *lc2;
+               bool            is_sorted;
+               int                     presorted_keys;
+
+               foreach(lc2, rel->partial_pathlist)
+               {
+                       Path       *subpath = (Path *) lfirst(lc2);
+                       GatherMergePath *path;
+
+                       /* path has no ordering at all, can't use incremental 
sort */
+                       if (subpath->pathkeys == NIL)
+                               continue;
+
+                       is_sorted = 
pathkeys_common_contained_in(useful_pathkeys,
+                                                                               
                         subpath->pathkeys,
+                                                                               
                         &presorted_keys);
+
+                       if (is_sorted)
+                       {
+                               path = create_gather_merge_path(root, rel, 
subpath, rel->reltarget,
+                                                                               
                subpath->pathkeys, NULL, rowsp);
+
+                               add_path(rel, &path->path);
+                               continue;
+                       }
+
+                       /* now we know is_sorted == false */
+
+                       /*
+                        * consider regular sort for cheapest partial path (for 
each
+                        * useful pathkeys)
+                        */
+                       if (cheapest_partial_path == subpath)
+                       {
+                               Path *tmp;
+
+                               tmp = (Path *) create_sort_path(root,
+                                                                               
                rel,
+                                                                               
                subpath,
+                                                                               
                useful_pathkeys,
+                                                                               
                -1.0);
+
+                               rows = tmp->rows * tmp->parallel_workers;
+
+                               path = create_gather_merge_path(root, rel,
+                                                                               
                tmp,
+                                                                               
                rel->reltarget,
+                                                                               
                tmp->pathkeys,
+                                                                               
                NULL,
+                                                                               
                rowsp);
+
+                               add_path(rel, &path->path);
+
+                               /* continue */
+                       }
+
+                       /* finally, consider incremental sort */
+                       if (presorted_keys > 0)
+                       {
+                               Path *tmp;
+
+                               /* Also consider incremental sort. */
+                               tmp = (Path *) 
create_incremental_sort_path(root,
+                                                                               
                                        rel,
+                                                                               
                                        subpath,
+                                                                               
                                        useful_pathkeys,
+                                                                               
                                        presorted_keys,
+                                                                               
                                        -1);
+
+                               path = create_gather_merge_path(root, rel,
+                                                                               
                tmp,
+                                                                               
                rel->reltarget,
+                                                                               
                tmp->pathkeys,
+                                                                               
                NULL,
+                                                                               
                rowsp);
+
+                               add_path(rel, &path->path);
+                       }
+               }
+       }
+}
+
 /*
  * make_rel_from_joinlist
  *       Build access paths using a "joinlist" to guide the join path search.
@@ -2902,7 +3263,7 @@ standard_join_search(PlannerInfo *root, int 
levels_needed, List *initial_rels)
                         * once we know the final targetlist (see 
grouping_planner).
                         */
                        if (lev < levels_needed)
-                               generate_gather_paths(root, rel, false);
+                               generate_useful_gather_paths(root, rel, false);
 
                        /* Find and save the cheapest paths for this rel */
                        set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c 
b/src/backend/optimizer/plan/createplan.c
index bfb52f21ab..c2877942cb 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5932,7 +5932,10 @@ prepare_sort_from_pathkeys(Plan *lefttree, List 
*pathkeys,
                                }
                        }
                        if (!j)
-                               elog(ERROR, "could not find pathkey item to 
sort");
+                       {
+                               elog(WARNING, "could not find pathkey item to 
sort");
+                               Assert(false);
+                       }
 
                        /*
                         * Do we need to insert a Result node?
@@ -6491,7 +6494,10 @@ make_unique_from_pathkeys(Plan *lefttree, List 
*pathkeys, int numCols)
                }
 
                if (!tle)
-                       elog(ERROR, "could not find pathkey item to sort");
+               {
+                       elog(WARNING, "could not find pathkey item to sort");
+                       Assert(false);
+               }
 
                /*
                 * Look up the correct equality operator from the PathKey's 
slightly
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 16996b1bc2..0939f2f7b9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5068,6 +5068,48 @@ create_ordered_paths(PlannerInfo *root,
 
                        add_path(ordered_rel, path);
                }
+
+               /* also consider incremental sorts on all partial paths */
+               {
+                       ListCell *lc;
+                       foreach (lc, input_rel->partial_pathlist)
+                       {
+                               Path       *input_path = (Path *) lfirst(lc);
+                               Path       *sorted_path = input_path;
+                               bool            is_sorted;
+                               int                     presorted_keys;
+
+                               /* already handled above */
+                               if (input_path == cheapest_partial_path)
+                                       continue;
+
+                               is_sorted = 
pathkeys_common_contained_in(root->sort_pathkeys,
+                                                                               
                                 input_path->pathkeys, &presorted_keys);
+
+                               /* also ignore already sorted paths */
+                               if (is_sorted)
+                                       continue;
+
+                               if (presorted_keys > 0)
+                               {
+                                       /* Also consider incremental sort. */
+                                       sorted_path = (Path *) 
create_incremental_sort_path(root,
+                                                                               
                                                                ordered_rel,
+                                                                               
                                                                input_path,
+                                                                               
                                                                
root->sort_pathkeys,
+                                                                               
                                                                presorted_keys,
+                                                                               
                                                                limit_tuples);
+
+                                       /* Add projection step if needed */
+                                       if (sorted_path->pathtarget != target)
+                                               sorted_path = 
apply_projection_to_path(root, ordered_rel,
+                                                                               
                                           sorted_path, target);
+
+                                       add_path(ordered_rel, sorted_path);
+                               }
+                       }
+
+               }
        }
 
        /*
@@ -6484,6 +6526,80 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo 
*input_rel,
                        }
                }
 
+
+               /*
+                * Use any available suitably-sorted path as input, with 
incremental
+                * sort path.
+                */
+               foreach(lc, input_rel->pathlist)
+               {
+                       Path       *path = (Path *) lfirst(lc);
+                       bool            is_sorted;
+                       int                     presorted_keys;
+
+                       is_sorted = 
pathkeys_common_contained_in(root->group_pathkeys,
+                                                                               
                         path->pathkeys,
+                                                                               
                         &presorted_keys);
+
+                       if (is_sorted)
+                               continue;
+
+                       if (presorted_keys == 0)
+                               continue;
+
+                       path = (Path *) create_incremental_sort_path(root,
+                                                                               
                                 grouped_rel,
+                                                                               
                                 path,
+                                                                               
                                 root->group_pathkeys,
+                                                                               
                                 presorted_keys,
+                                                                               
                                 -1.0);
+
+                       /* Now decide what to stick atop it */
+                       if (parse->groupingSets)
+                       {
+                               consider_groupingsets_paths(root, grouped_rel,
+                                                                               
        path, true, can_hash,
+                                                                               
        gd, agg_costs, dNumGroups);
+                       }
+                       else if (parse->hasAggs)
+                       {
+                               /*
+                                * We have aggregation, possibly with plain 
GROUP BY. Make
+                                * an AggPath.
+                                */
+                               add_path(grouped_rel, (Path *)
+                                                create_agg_path(root,
+                                                                               
 grouped_rel,
+                                                                               
 path,
+                                                                               
 grouped_rel->reltarget,
+                                                                               
 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+                                                                               
 AGGSPLIT_SIMPLE,
+                                                                               
 parse->groupClause,
+                                                                               
 havingQual,
+                                                                               
 agg_costs,
+                                                                               
 dNumGroups));
+                       }
+                       else if (parse->groupClause)
+                       {
+                               /*
+                                * We have GROUP BY without aggregation or 
grouping sets.
+                                * Make a GroupPath.
+                                */
+                               add_path(grouped_rel, (Path *)
+                                                create_group_path(root,
+                                                                               
   grouped_rel,
+                                                                               
   path,
+                                                                               
   parse->groupClause,
+                                                                               
   havingQual,
+                                                                               
   dNumGroups));
+                       }
+                       else
+                       {
+                               /* Other cases should have been handled above */
+                               Assert(false);
+                       }
+               }
+
                /*
                 * Instead of operating directly on the input relation, we can
                 * consider finalizing a partially aggregated path.
@@ -6530,6 +6646,53 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo 
*input_rel,
                                                                                
           havingQual,
                                                                                
           dNumGroups));
                        }
+
+                       /* incremental sort */
+                       foreach(lc, partially_grouped_rel->pathlist)
+                       {
+                               Path       *path = (Path *) lfirst(lc);
+                               bool            is_sorted;
+                               int                     presorted_keys;
+
+                               is_sorted = 
pathkeys_common_contained_in(root->group_pathkeys,
+                                                                               
                                 path->pathkeys,
+                                                                               
                                 &presorted_keys);
+
+                               if (is_sorted)
+                                       continue;
+
+                               if (presorted_keys == 0)
+                                       continue;
+
+                               path = (Path *) 
create_incremental_sort_path(root,
+                                                                               
                                         grouped_rel,
+                                                                               
                                         path,
+                                                                               
                                         root->group_pathkeys,
+                                                                               
                                         presorted_keys,
+                                                                               
                                         -1.0);
+
+                               if (parse->hasAggs)
+                                       add_path(grouped_rel, (Path *)
+                                                        create_agg_path(root,
+                                                                               
         grouped_rel,
+                                                                               
         path,
+                                                                               
         grouped_rel->reltarget,
+                                                                               
         parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+                                                                               
         AGGSPLIT_FINAL_DESERIAL,
+                                                                               
         parse->groupClause,
+                                                                               
         havingQual,
+                                                                               
         agg_final_costs,
+                                                                               
         dNumGroups));
+                               else
+                                       add_path(grouped_rel, (Path *)
+                                                        create_group_path(root,
+                                                                               
           grouped_rel,
+                                                                               
           path,
+                                                                               
           parse->groupClause,
+                                                                               
           havingQual,
+                                                                               
           dNumGroups));
+                       }
+
                }
        }
 
@@ -6798,6 +6961,57 @@ create_partial_grouping_paths(PlannerInfo *root,
                                                                                
           dNumPartialGroups));
                        }
                }
+
+               /*
+                * Use any available suitably-sorted path as input, and also 
consider
+                * sorting the cheapest partial path.
+                */
+               foreach(lc, input_rel->pathlist)
+               {
+                       Path       *path = (Path *) lfirst(lc);
+                       bool            is_sorted;
+                       int                     presorted_keys;
+
+                       is_sorted = 
pathkeys_common_contained_in(root->group_pathkeys,
+                                                                               
                         path->pathkeys,
+                                                                               
                         &presorted_keys);
+
+                       /* also ignore already sorted paths */
+                       if (is_sorted)
+                               continue;
+
+                       if (presorted_keys == 0)
+                               continue;
+
+                       /* add incremental sort */
+                       path = (Path *) create_incremental_sort_path(root,
+                                                                               
                                 partially_grouped_rel,
+                                                                               
                                 path,
+                                                                               
                                 root->group_pathkeys,
+                                                                               
                                 presorted_keys,
+                                                                               
                                 -1.0);
+
+                       if (parse->hasAggs)
+                               add_path(partially_grouped_rel, (Path *)
+                                                create_agg_path(root,
+                                                                               
 partially_grouped_rel,
+                                                                               
 path,
+                                                                               
 partially_grouped_rel->reltarget,
+                                                                               
 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+                                                                               
 AGGSPLIT_INITIAL_SERIAL,
+                                                                               
 parse->groupClause,
+                                                                               
 NIL,
+                                                                               
 agg_partial_costs,
+                                                                               
 dNumPartialGroups));
+                       else
+                               add_path(partially_grouped_rel, (Path *)
+                                                create_group_path(root,
+                                                                               
   partially_grouped_rel,
+                                                                               
   path,
+                                                                               
   parse->groupClause,
+                                                                               
   NIL,
+                                                                               
   dNumPartialGroups));
+               }
        }
 
        if (can_sort && cheapest_partial_path != NULL)
@@ -6842,6 +7056,52 @@ create_partial_grouping_paths(PlannerInfo *root,
                                                                                
                           dNumPartialPartialGroups));
                        }
                }
+
+               /* consider incremental sort */
+               foreach(lc, input_rel->partial_pathlist)
+               {
+                       Path       *path = (Path *) lfirst(lc);
+                       bool            is_sorted;
+                       int                     presorted_keys;
+
+                       is_sorted = 
pathkeys_common_contained_in(root->group_pathkeys,
+                                                                               
                         path->pathkeys,
+                                                                               
                         &presorted_keys);
+
+                       if (is_sorted)
+                               continue;
+
+                       if (presorted_keys == 0)
+                               continue;
+
+                       path = (Path *) create_incremental_sort_path(root,
+                                                                               
                                 partially_grouped_rel,
+                                                                               
                                 path,
+                                                                               
                                 root->group_pathkeys,
+                                                                               
                                 presorted_keys,
+                                                                               
                                 -1.0);
+
+                       if (parse->hasAggs)
+                               add_partial_path(partially_grouped_rel, (Path *)
+                                                                
create_agg_path(root,
+                                                                               
                 partially_grouped_rel,
+                                                                               
                 path,
+                                                                               
                 partially_grouped_rel->reltarget,
+                                                                               
                 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+                                                                               
                 AGGSPLIT_INITIAL_SERIAL,
+                                                                               
                 parse->groupClause,
+                                                                               
                 NIL,
+                                                                               
                 agg_partial_costs,
+                                                                               
                 dNumPartialPartialGroups));
+                       else
+                               add_partial_path(partially_grouped_rel, (Path *)
+                                                                
create_group_path(root,
+                                                                               
                   partially_grouped_rel,
+                                                                               
                   path,
+                                                                               
                   parse->groupClause,
+                                                                               
                   NIL,
+                                                                               
                   dNumPartialPartialGroups));
+               }
        }
 
        if (can_hash && cheapest_total_path != NULL)
@@ -6938,6 +7198,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 static void
 gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
 {
+       ListCell   *lc;
        Path       *cheapest_partial_path;
 
        /* Try Gather for unordered paths and Gather Merge for ordered ones. */
@@ -6967,6 +7228,44 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
 
                add_path(rel, path);
        }
+
+       /* also consider incremental sort on all partial paths */
+       foreach (lc, rel->partial_pathlist)
+       {
+               Path       *path = (Path *) lfirst(lc);
+               bool            is_sorted;
+               int                     presorted_keys;
+               double          total_groups;
+
+               is_sorted = pathkeys_common_contained_in(root->group_pathkeys,
+                                                                               
                 path->pathkeys,
+                                                                               
                 &presorted_keys);
+
+               if (is_sorted)
+                       continue;
+
+               if (presorted_keys == 0)
+                       continue;
+
+               path = (Path *) create_incremental_sort_path(root,
+                                                                               
                         rel,
+                                                                               
                         path,
+                                                                               
                         root->group_pathkeys,
+                                                                               
                         presorted_keys,
+                                                                               
                         -1.0);
+
+               path = (Path *)
+                       create_gather_merge_path(root,
+                                                                        rel,
+                                                                        path,
+                                                                        
rel->reltarget,
+                                                                        
root->group_pathkeys,
+                                                                        NULL,
+                                                                        
&total_groups);
+
+               add_path(rel, path);
+       }
+
 }
 
 /*
@@ -7222,7 +7521,7 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
         * one of the generated paths may turn out to be the cheapest one.
         */
        if (rel->consider_parallel && !IS_OTHER_REL(rel))
-               generate_gather_paths(root, rel, false);
+               generate_useful_gather_paths(root, rel, false);
 
        /*
         * Reassess which paths are the cheapest, now that we've potentially 
added
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index e7a40cec3f..20fa94281b 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -54,6 +54,8 @@ extern RelOptInfo *standard_join_search(PlannerInfo *root, 
int levels_needed,
 
 extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
                                                                  bool 
override_rows);
+extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+                                                                               
 bool override_rows);
 extern int     compute_parallel_worker(RelOptInfo *rel, double heap_pages,
                                                                        double 
index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
-- 
2.21.0

>From 3a7bcb8940911029ba5927e9993a93f86139f7f4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Tue, 9 Jul 2019 00:13:04 +0200
Subject: [PATCH 2/4] fix costing in cost_incremental_sort

---
 src/backend/optimizer/path/costsize.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 7f820e7351..c6aa17ba67 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1875,16 +1875,8 @@ cost_incremental_sort(Path *path,
                                   limit_tuples);
 
        /* If we have a LIMIT, adjust the number of groups we'll have to 
return. */
-       if (limit_tuples > 0 && limit_tuples < input_tuples)
-       {
-               output_tuples = limit_tuples;
-               output_groups = floor(output_tuples / group_tuples) + 1;
-       }
-       else
-       {
-               output_tuples = input_tuples;
-               output_groups = input_groups;
-       }
+       output_tuples = input_tuples;
+       output_groups = input_groups;
 
        /*
         * Startup cost of incremental sort is the startup cost of its first 
group
-- 
2.21.0

>From 6a439153321e40777d791b70424424a501485408 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Tue, 9 Jul 2019 00:13:34 +0200
Subject: [PATCH 3/4] fix explain in parallel mode

---
 src/backend/commands/explain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d3f855a12a..925e8236ba 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2775,7 +2775,7 @@ show_incremental_sort_info(IncrementalSortState 
*incrsortstate,
                                                                 
fullsort_spaceUsed, fullsort_group_count);
                                if (prefixsort_instrument)
                                        appendStringInfo(es->str,
-                                                                        ", 
Prefix Sort Method: %s  %s: %ldkB  Groups: %ld",
+                                                                        ", 
Prefix Sort Method: %s  %s: %ldkB  Groups: %ld\n",
                                                                         
prefixsort_sortMethod, prefixsort_spaceType,
                                                                         
prefixsort_spaceUsed, prefixsort_group_count);
                                else
-- 
2.21.0

>From 9dc5530429083195e90129b3eb18f8d7a5f78451 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Tue, 9 Jul 2019 13:36:42 +0200
Subject: [PATCH 4/4] add force_incremental_sort GUC

---
 src/backend/optimizer/path/costsize.c |  8 ++++++++
 src/backend/utils/misc/guc.c          | 10 ++++++++++
 src/include/optimizer/cost.h          |  1 +
 3 files changed, 19 insertions(+)

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index c6aa17ba67..ee4487b158 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -140,6 +140,8 @@ bool                enable_parallel_append = true;
 bool           enable_parallel_hash = true;
 bool           enable_partition_pruning = true;
 
+bool           force_incremental_sort = true;
+
 typedef struct
 {
        PlannerInfo *root;
@@ -1907,6 +1909,12 @@ cost_incremental_sort(Path *path,
        path->rows = input_tuples;
        path->startup_cost = startup_cost;
        path->total_cost = startup_cost + run_cost;
+
+       if (force_incremental_sort)
+       {
+               path->startup_cost = input_startup_cost;
+               path->total_cost = input_total_cost;
+       }
 }
 
 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f922ee66a0..d2cc5b56b5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -941,6 +941,16 @@ static struct config_bool ConfigureNamesBool[] =
                true,
                NULL, NULL, NULL
        },
+       {
+               {"force_incremental_sort", PGC_USERSET, QUERY_TUNING_METHOD,
+                       gettext_noop("Makes the incremental sort look like 
no-cost."),
+                       NULL,
+                       GUC_EXPLAIN
+               },
+               &force_incremental_sort,
+               false,
+               NULL, NULL, NULL
+       },
        {
                {"enable_incrementalsort", PGC_USERSET, QUERY_TUNING_METHOD,
                        gettext_noop("Enables the planner's use of incremental 
sort steps."),
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b9d7a77e65..bad1d5a330 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -66,6 +66,7 @@ extern PGDLLIMPORT bool enable_parallel_append;
 extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT int constraint_exclusion;
+extern PGDLLIMPORT bool force_incremental_sort;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
                                                                  double 
index_pages, PlannerInfo *root);
-- 
2.21.0

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Reply via email to