Remove WindowClause PARTITION BY items belonging to redundant pathkeys

David Rowley Wed, 07 Jun 2023 16:37:44 -0700

Recently Markus Winand pointed out to me that the PG15 changes made in
[1] to teach the query planner about monotonic window functions
improved the situation for PostgreSQL on his feature/optimization
timeline for PostgreSQL.  These can be seen in [2].


Unfortunately, if you look at the timeline in [2], we're not quite on
green just yet per Markus's "Not with partition by clause (see below)"
caveat.  This is because nodeWindowAgg.c's use_pass_through code must
be enabled when the WindowClause has a PARTITION BY clause.

The reason for this is that we can't just stop spitting out rows from
the WindowAgg when one partition is done as we still need to deal with
rows from any subsequent partitions and we can only get to those by
continuing to read rows until we find rows belonging to the next
partition.

There is however a missed optimisation here when there is a PARTITION
BY clause, but also some qual exists for the column(s) mentioned in
the partition by clause that makes it so only one partition can exist.
A simple example of that is in the following:

EXPLAIN
SELECT *
FROM
  (SELECT
      relkind,
      pg_relation_size(oid) size,
      rank() OVER (PARTITION BY relkind ORDER BY pg_relation_size(oid) DESC
      ) rank
    FROM pg_class)
WHERE relkind = 'r' AND rank <= 10;

(the subquery may be better imagined as a view)

Here, because of the relkind='r' qual being pushed down into the
subquery, effectively that renders the PARTITION BY relkind clause
redundant.

What the attached patch does is process each WindowClause and removes
any items from the PARTITION BY clause that are columns or expressions
relating to redundant PathKeys.

Effectively, this allows the nodeWindowAgg.c code which stops
processing WindowAgg rows when the run condition is met to work as the
PARTITION BY clause is completely removed in the case of the above
query.  Removing the redundant PARTITION BY items also has the added
benefit of not having to needlessly check if the next row belongs to
the same partition as the last row. For the above, that check is a
waste of time as all rows have relkind = 'r'

I passed the patch along to Markus and he kindly confirmed that we're
now green for this particular optimisation.

I'll add this patch to the July commitfest.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9d9c02ccd
[2] https://use-the-index-luke.com/sql/partial-results/window-functions

diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 1e4dd27dba..deb22e9404 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5999,6 +5999,9 @@ make_window_input_target(PlannerInfo *root,
  *             Create a pathkeys list describing the required input ordering
  *             for the given WindowClause.
  *
+ * Modifies wc's partitionClause to remove any clauses which are deemed
+ * redundant by the pathkey logic.
+ *
  * The required ordering is first the PARTITION keys, then the ORDER keys.
  * In the future we might try to implement windowing using hashing, in which
  * case the ordering could be relaxed, but for now we always sort.
@@ -6007,8 +6010,7 @@ static List *
 make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
                                                 List *tlist)
 {
-       List       *window_pathkeys;
-       List       *window_sortclauses;
+       List       *window_pathkeys = NIL;
 
        /* Throw error if can't sort */
        if (!grouping_is_sortable(wc->partitionClause))
@@ -6022,12 +6024,43 @@ make_pathkeys_for_window(PlannerInfo *root, 
WindowClause *wc,
                                 errmsg("could not implement window ORDER BY"),
                                 errdetail("Window ordering columns must be of 
sortable datatypes.")));
 
-       /* Okay, make the combined pathkeys */
-       window_sortclauses = list_concat_copy(wc->partitionClause, 
wc->orderClause);
-       window_pathkeys = make_pathkeys_for_sortclauses(root,
-                                                                               
                        window_sortclauses,
-                                                                               
                        tlist);
-       list_free(window_sortclauses);
+       /*
+        * First fetch the pathkeys for the PARTITION BY clause.  We can safely
+        * remove any clauses from the wc->partitionClause for redundant 
pathkeys.
+        */
+       if (wc->partitionClause != NIL)
+       {
+               bool            sortable;
+
+               window_pathkeys = make_pathkeys_for_sortclauses_extended(root,
+                                                                               
                                                 &wc->partitionClause,
+                                                                               
                                                 tlist,
+                                                                               
                                                 true,
+                                                                               
                                                 &sortable);
+
+               Assert(sortable);
+       }
+
+       /*
+        * And fetch the pathkeys for the ORDER BY clause.  We must keep any
+        * redundant pathkeys.  Removing these would change the semantics of 
peer
+        * rows during execution.
+        */
+       if (wc->orderClause != NIL)
+       {
+               List       *orderby_pathkeys;
+
+               orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+                                                                               
                                 wc->orderClause,
+                                                                               
                                 tlist);
+
+               /* Okay, make the combined pathkeys */
+               if (window_pathkeys != NIL)
+                       window_pathkeys = append_pathkeys(window_pathkeys, 
orderby_pathkeys);
+               else
+                       window_pathkeys = orderby_pathkeys;
+       }
+
        return window_pathkeys;
 }
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 0ca298f5a1..94520103a1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1473,6 +1473,8 @@ typedef struct GroupingSet
  * if the clause originally came from WINDOW, and is NULL if it originally
  * was an OVER clause (but note that we collapse out duplicate OVERs).
  * partitionClause and orderClause are lists of SortGroupClause structs.
+ * partitionClause is sanitized by the query planner to remove any columns or
+ * expressions belonging to redundant PathKeys.
  * If we have RANGE with offset PRECEDING/FOLLOWING, the semantics of that are
  * specified by startInRangeFunc/inRangeColl/inRangeAsc/inRangeNullsFirst
  * for the start offset, or endInRangeFunc/inRange* for the end offset.

Remove WindowClause PARTITION BY items belonging to redundant pathkeys

Reply via email to