Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?

David Rowley Tue, 19 Jul 2022 20:03:27 -0700

Hackers,

Currently, if we have a query such as:


SELECT a,b, COUNT(*)
FROM a
INNER JOIN b on a.a = b.x
GROUP BY a,b
ORDER BY a DESC, b;

With enable_hashagg = off, we get the following plan:

              QUERY PLAN
---------------------------------------
 GroupAggregate
   Group Key: a.a, a.b
   ->  Sort
         Sort Key: a.a DESC, a.b
         ->  Merge Join
               Merge Cond: (a.a = b.x)
               ->  Sort
                     Sort Key: a.a
                     ->  Seq Scan on a
               ->  Sort
                     Sort Key: b.x
                     ->  Seq Scan on b

We can see that the merge join picked to sort the input on a.a rather
than a.a DESC.  This is due to the way
select_outer_pathkeys_for_merge() only picks the query_pathkeys as a
prefix of the join pathkeys if we can find all of the join pathkeys in
the query_pathkeys.

I think we can relax this now that we have incremental sort.  I think
a better way to limit this is to allow a prefix of the query_pathkeys
providing that covers *all* of the join pathkeys.  That way, for the
above query, it leaves it open for the planner to do the Merge Join by
sorting by a.a DESC then just do an Incremental Sort to get the
GroupAggregate input sorted by a.b.

I've attached a patch for this and it changes the plan for the above query to:

               QUERY PLAN
----------------------------------------
 GroupAggregate
   Group Key: a.a, a.b
   ->  Incremental Sort
         Sort Key: a.a DESC, a.b
         Presorted Key: a.a
         ->  Merge Join
               Merge Cond: (a.a = b.x)
               ->  Sort
                     Sort Key: a.a DESC
                     ->  Seq Scan on a
               ->  Sort
                     Sort Key: b.x DESC
                     ->  Seq Scan on b

The current behaviour is causing me a bit of trouble in plan
regression for the ORDER BY / DISTINCT aggregate improvement patch
that I have pending.

Is there any reason that we shouldn't do this?

David

diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 11de286e60..cf08fcac51 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1924,11 +1924,13 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
  * Since we assume here that a sort is required, there is no particular use
  * in matching any available ordering of the outerrel.  (joinpath.c has an
  * entirely separate code path for considering sort-free mergejoins.)  Rather,
- * it's interesting to try to match the requested query_pathkeys so that a
- * second output sort may be avoided; and failing that, we try to list "more
- * popular" keys (those with the most unmatched EquivalenceClass peers)
- * earlier, in hopes of making the resulting ordering useful for as many
- * higher-level mergejoins as possible.
+ * it's interesting to try to match, or match a prefix of the requested
+ * query_pathkeys so that a second output sort may be avoided.  We can get
+ * away with just a prefix of the query_pathkeys when that prefix covers the
+ * entire join condition.  Failing that, we try to list "more popular" keys
+ *  (those with the most unmatched EquivalenceClass peers) earlier, in hopes
+ * of making the resulting ordering useful for as many higher-level mergejoins
+ * as possible.
  */
 List *
 select_outer_pathkeys_for_merge(PlannerInfo *root,
@@ -1998,11 +2000,16 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 
        /*
         * Find out if we have all the ECs mentioned in query_pathkeys; if so we
-        * can generate a sort order that's also useful for final output. There 
is
-        * no percentage in a partial match, though, so we have to have 'em all.
+        * can generate a sort order that's also useful for final output. If we
+        * only have a prefix of the query_pathkeys, and that prefix is the 
entire
+        * join condition, then it's useful to use the prefix as the pathkeys as
+        * this increases the chances that an incremental sort will be able to 
be
+        * used by the upper planner.
         */
        if (root->query_pathkeys)
        {
+               int             matches = 0;
+
                foreach(lc, root->query_pathkeys)
                {
                        PathKey    *query_pathkey = (PathKey *) lfirst(lc);
@@ -2015,6 +2022,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
                        }
                        if (j >= necs)
                                break;                  /* didn't find match */
+
+                       matches++;
                }
                /* if we got to the end of the list, we have them all */
                if (lc == NULL)
@@ -2037,6 +2046,22 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
                                }
                        }
                }
+
+               /*
+                * If we didn't find a full match, but matched all of the join 
clauses
+                * then we'll make use of these as presorted input is better 
than
+                * nothing for the upper planner.
+                */
+               else if (matches == nClauses)
+               {
+                       pathkeys = list_copy_head(root->query_pathkeys, 
matches);
+
+                       /* we have all of the join pathkeys, so nothing more to 
do */
+                       pfree(ecs);
+                       pfree(scores);
+
+                       return pathkeys;
+               }
        }
 
        /*
diff --git a/src/test/regress/expected/tuplesort.out 
b/src/test/regress/expected/tuplesort.out
index ef79574ecf..a2efa179fc 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -622,17 +622,18 @@ EXPLAIN (COSTS OFF) :qry;
          ->  GroupAggregate
                Group Key: a.col12
                Filter: (count(*) > 1)
-               ->  Sort
+               ->  Incremental Sort
                      Sort Key: a.col12 DESC, a.col1
+                     Presorted Key: a.col12
                      ->  Merge Join
                            Merge Cond: (a.col12 = b.col12)
                            ->  Sort
-                                 Sort Key: a.col12
+                                 Sort Key: a.col12 DESC
                                  ->  Seq Scan on test_mark_restore a
                            ->  Sort
-                                 Sort Key: b.col12
+                                 Sort Key: b.col12 DESC
                                  ->  Seq Scan on test_mark_restore b
-(16 rows)
+(17 rows)
 
 :qry;
  col12 | count | count | count | count | count 
@@ -660,17 +661,18 @@ EXPLAIN (COSTS OFF) :qry;
          ->  GroupAggregate
                Group Key: a.col12
                Filter: (count(*) > 1)
-               ->  Sort
+               ->  Incremental Sort
                      Sort Key: a.col12 DESC, a.col1
+                     Presorted Key: a.col12
                      ->  Merge Join
                            Merge Cond: (a.col12 = b.col12)
                            ->  Sort
-                                 Sort Key: a.col12
+                                 Sort Key: a.col12 DESC
                                  ->  Seq Scan on test_mark_restore a
                            ->  Sort
-                                 Sort Key: b.col12
+                                 Sort Key: b.col12 DESC
                                  ->  Seq Scan on test_mark_restore b
-(16 rows)
+(17 rows)
 
 :qry;
  col12 | count | count | count | count | count

Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?

Reply via email to