On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsanki...@gmail.com> wrote:
>
>
> > On 09/01/23 17:53, David Rowley wrote:
> > I gave some thought to whether doing foreach_delete_current() is safe
> > within a foreach_reverse() loop. I didn't try it, but I couldn't see
> > any reason why not.  It is pretty late here and I'd need to test that
> > to be certain. If it turns out not to be safe then we need to document
> > that fact in the comments of the foreach_reverse() macro and the
> > foreach_delete_current() macro.
>
> I tested foreach_delete_current inside foreach_reverse loop.
>
> It worked fine.

I also thought I'd better test that foreach_delete_current() works
with foreach_reverse(). I can confirm that it *does not* work
correctly.  I guess maybe you only tested the fact that it deleted the
current item and not that the subsequent loop correctly went to the
item directly before the deleted item. There's a problem with that. We
skip an item.

Instead of fixing that, I think it's likely better just to loop
backwards manually with a for() loop, so I've adjusted the patch to
work that way.  It's quite likely that the additional code in
foreach() and what was in foreach_reverse() is slower than looping
manually due to the additional work those macros do to set the cell to
NULL when we run out of cells to loop over.

> I have attached patch with one extra test case (as mentioned above).
> Rest of the changes are looking fine.

I made another pass over the v7 patch and fixed a bug that was
disabling the optimization when the deepest WindowAgg had a
runCondition.  This should have been using llast_node instead of
linitial_node.  The top-level WindowAgg is the last in the list.

I also made a pass over the tests and added a test case for the
runCondition check to make sure we disable the optimization when the
top-level WindowAgg has one of those.  I wasn't sure what your test
comments case numbers were meant to represent. They were not aligned
with the code comments that document when the optimisation is
disabled, they started out aligned, but seemed to go off the rails at
#3. I've now made them follow the comments in create_one_window_path()
and made it more clear what we expect the test outcome to be in each
case.

I've attached the v9 patch. I feel like this patch is quite
self-contained and I'm quite happy with it now.  If there are no
objections soon, I'm planning on pushing it.

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 093ace0864..c110696403 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4464,8 +4464,11 @@ create_one_window_path(PlannerInfo *root,
                                           List *activeWindows)
 {
        PathTarget *window_target;
+       Query      *parse = root->parse;
        ListCell   *l;
        List       *topqual = NIL;
+       List       *orderby_pathkeys = NIL;
+       int                     sort_pushdown_idx = -1;
 
        /*
         * Since each window clause could require a different sort order, we 
stack
@@ -4484,6 +4487,75 @@ create_one_window_path(PlannerInfo *root,
         */
        window_target = input_target;
 
+       /*
+        * Here we attempt to minimize the number of sorts which must be 
performed
+        * for queries with an ORDER BY clause.
+        *
+        * It's possible due to select_active_windows() putting the 
WindowClauses
+        * with the lowest tleSortGroupRef last in the activeWindows list that 
the
+        * final WindowClause has a subset of the sort requirements that the
+        * query's ORDER BY clause has.  Below we try to detect this case and if
+        * we find this is true, we consider adjusting the sort that's done for
+        * WindowAggs and include the additional clauses so that no additional
+        * sorting is required for the query's ORDER BY clause.
+        *
+        * We don't try this optimization for the following cases:
+        *
+        * 1. If the query has a DISTINCT clause.  We needn't waste any 
additional
+        * effort for the more strict sort order as if DISTINCT is done via Hash
+        * Aggregate then that's going to undo this work.
+        *
+        * 2. If the query has a LIMIT clause.  The top-level sort will be able 
to
+        * use a top-n sort which might be more efficient than sorting  by the
+        * additional columns.  If the LIMIT does not reduce the        number 
of rows
+        * significantly then this might not be true, but we don't try to 
consider
+        * that here.
+        *
+        * 3. If the top-level WindowClause has a runCondition then this can
+        * filter out tuples and make the final sort cheaper.  If we pushed the
+        * sort down below the WindowAgg then we'd need to sort all rows 
including
+        * ones that the runCondition might filter out.  This may waste effort 
so
+        * we just don't try to push down the sort for this case.
+        *
+        * 4. If the ORDER BY contains any expressions containing WindowFuncs 
then
+        * we can't push down the sort as these obviously must be evaluated 
before
+        * they can be sorted.
+        */
+       if (parse->sortClause != NIL && parse->distinctClause == NIL &&
+               parse->limitCount == NULL &&
+               llast_node(WindowClause, activeWindows)->runCondition == NIL &&
+               !contain_window_function((Node *) 
get_sortgrouplist_exprs(parse->sortClause,
+                                                                               
                                                  root->processed_tlist)))
+       {
+               orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+                                                                               
                                 parse->sortClause,
+                                                                               
                                 root->processed_tlist);
+
+               /*
+                * Loop backwards over the WindowClauses looking for the first
+                * WindowClause which has pathkeys not contained in the
+                * orderby_pathkeys.  What we're looking for here is the lowest 
level
+                * that we push the additional sort requirements down into.  If 
we
+                * can't find any WindowClause with suitable pathkeys here then
+                * sort_pushdown_idx will remain at -1 which will effectively 
disable
+                * the pushdown attempt later in this function.
+                */
+               for (int i = list_length(activeWindows) - 1; i >= 0; i--)
+               {
+                       WindowClause *wc = list_nth_node(WindowClause, 
activeWindows, i);
+                       List       *window_pathkeys;
+
+                       window_pathkeys = make_pathkeys_for_window(root,
+                                                                               
                           wc,
+                                                                               
                           root->processed_tlist);
+
+                       if (pathkeys_contained_in(window_pathkeys, 
orderby_pathkeys))
+                               sort_pushdown_idx = i;
+                       else
+                               break;
+               }
+       }
+
        foreach(l, activeWindows)
        {
                WindowClause *wc = lfirst_node(WindowClause, l);
@@ -4496,6 +4568,25 @@ create_one_window_path(PlannerInfo *root,
                                                                                
                   wc,
                                                                                
                   root->processed_tlist);
 
+               /*
+                * When we get to the sort_pushdown_idx WindowClause, if the 
path is
+                * not already correctly sorted, then just use the pathkeys for 
the
+                * query's ORDER BY clause instead of the WindowClause's 
pathkeys.
+                * When the path is already correctly sorted there's no point in
+                * adjusting the pathkeys as this just moves the sort down 
without
+                * actually reducing the number of sorts which are required in 
the
+                * plan overall.  sort_pushdown_idx will be -1 and never match 
when
+                * this optimization was disabled above.
+                */
+               if (foreach_current_index(l) == sort_pushdown_idx &&
+                       !pathkeys_contained_in(window_pathkeys, path->pathkeys))
+               {
+                       /* check to make sure sort_pushdown_idx was set 
correctly */
+                       Assert(pathkeys_contained_in(window_pathkeys, 
orderby_pathkeys));
+
+                       window_pathkeys = orderby_pathkeys;
+               }
+
                is_sorted = pathkeys_count_contained_in(window_pathkeys,
                                                                                
                path->pathkeys,
                                                                                
                &presorted_keys);
diff --git a/src/test/regress/expected/window.out 
b/src/test/regress/expected/window.out
index b2c6605e60..1a141413f8 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3984,6 +3984,226 @@ ORDER BY depname, empno;
                                  ->  Seq Scan on empsalary
 (12 rows)
 
+CREATE INDEX empsalary_depname_idx ON empsalary(depname);
+SET enable_seqscan TO off;
+-- Ensure we don't push the ORDER BY sort below the WindowAgg when the input
+-- node to the WindowAgg is already sorted correctly.  This would just move
+-- the sort down and not reduce the number of sorts.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Incremental Sort
+   Sort Key: depname, empno
+   Presorted Key: depname
+   ->  WindowAgg
+         ->  Index Scan using empsalary_depname_idx on empsalary
+(5 rows)
+
+DROP INDEX empsalary_depname_idx;
+RESET enable_seqscan;
+-- Ensure we push the ORDER BY sort below the WindowAgg
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+                  QUERY PLAN                   
+-----------------------------------------------
+ WindowAgg
+   ->  Sort
+         Sort Key: depname, empno, enroll_date
+         ->  Seq Scan on empsalary
+(4 rows)
+
+-- A more complex permutation of the above.  Ensure the ORDER BY sort is
+-- pushed down below the WindowAgg when there is a PARTITION BY and ORDER BY
+-- clause in the WindowFunc
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+                  QUERY PLAN                   
+-----------------------------------------------
+ WindowAgg
+   ->  Sort
+         Sort Key: depname, empno, enroll_date
+         ->  Seq Scan on empsalary
+(4 rows)
+
+-- Try more complex permutations of pushing the ORDER BY's sort below the
+-- WindowAgg.  With multiple WindowAggs sharing the same sort, ensure we push
+-- the sort to the correct level.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+       sum(salary) OVER (PARTITION BY depname) depsalary,
+       count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+                      QUERY PLAN                      
+------------------------------------------------------
+ WindowAgg
+   ->  WindowAgg
+         ->  Sort
+               Sort Key: depname, empno, enroll_date
+               ->  WindowAgg
+                     ->  Sort
+                           Sort Key: enroll_date DESC
+                           ->  Seq Scan on empsalary
+(8 rows)
+
+-- As above, but swap the order of the WindowFuncs to ensure their order is
+-- not a factor in the plan choice.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       count(*) OVER (ORDER BY enroll_date DESC) c,
+       sum(salary) OVER (PARTITION BY depname) depsalary,
+       min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+                      QUERY PLAN                      
+------------------------------------------------------
+ WindowAgg
+   ->  WindowAgg
+         ->  Sort
+               Sort Key: depname, empno, enroll_date
+               ->  WindowAgg
+                     ->  Sort
+                           Sort Key: enroll_date DESC
+                           ->  Seq Scan on empsalary
+(8 rows)
+
+-- Ensure that we don't push the ORDER BY's sort down for the following cases.
+-- See the comment in create_one_window_path() for details of each case that
+-- we don't perform the optimization.
+-- Case #1: Don't push the ORDER BY's sort down when there's a DISTINCT clause
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+       depname,
+       min(salary) OVER (PARTITION BY depname) depminsalary,
+       sum(salary) OVER (PARTITION BY depname) depsalary,
+       count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+                                              QUERY PLAN                       
                        
+-------------------------------------------------------------------------------------------------------
+ Unique
+   ->  Incremental Sort
+         Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER 
(?)), (count(*) OVER (?))
+         Presorted Key: depname
+         ->  WindowAgg
+               ->  Sort
+                     Sort Key: depname
+                     ->  WindowAgg
+                           ->  Sort
+                                 Sort Key: enroll_date DESC
+                                 ->  Seq Scan on empsalary
+(11 rows)
+
+-- Case #2: Don't push the ORDER BY's sort down when there's a LIMIT clause
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       min(salary) OVER (PARTITION BY empno) depminsalary,
+       sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname
+LIMIT 1;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Limit
+   ->  Incremental Sort
+         Sort Key: empno, depname
+         Presorted Key: empno
+         ->  WindowAgg
+               ->  Sort
+                     Sort Key: empno
+                     ->  Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: Don't push the ORDER BY's sort down when the top-level WindowAgg
+-- has a runCondition.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+  SELECT depname,
+         empno,
+         row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+         sum(salary) OVER (PARTITION BY depname ORDER BY empno) depsalary
+  FROM empsalary
+  ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn < 3;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Subquery Scan on e
+   ->  Incremental Sort
+         Sort Key: empsalary.depname, empsalary.enroll_date, empsalary.empno
+         Presorted Key: empsalary.depname, empsalary.enroll_date
+         ->  WindowAgg
+               Run Condition: (row_number() OVER (?) < 3)
+               ->  Incremental Sort
+                     Sort Key: empsalary.depname, empsalary.enroll_date
+                     Presorted Key: empsalary.depname
+                     ->  WindowAgg
+                           ->  Sort
+                                 Sort Key: empsalary.depname, empsalary.empno
+                                 ->  Seq Scan on empsalary
+(13 rows)
+
+-- Try a positive version of case #3 where the runCondition exists but it's
+-- not on the final WindowAgg.  Ensure the sort is pushed below the WindowAgg
+-- in this case.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+  SELECT depname,
+         empno,
+         row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+         row_number() OVER (PARTITION BY depname ORDER BY empno) emprn2
+  FROM empsalary
+  ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn2 < 3;
+                                    QUERY PLAN                                 
    
+-----------------------------------------------------------------------------------
+ Subquery Scan on e
+   ->  WindowAgg
+         Filter: ((row_number() OVER (?)) < 3)
+         ->  Incremental Sort
+               Sort Key: empsalary.depname, empsalary.enroll_date, 
empsalary.empno
+               Presorted Key: empsalary.depname
+               ->  WindowAgg
+                     Run Condition: (row_number() OVER (?) < 3)
+                     ->  Sort
+                           Sort Key: empsalary.depname, empsalary.empno
+                           ->  Seq Scan on empsalary
+(11 rows)
+
+-- Case #4: Don't push the ORDER BY's sort down when the ORDER BY contains
+-- WindowFunc results
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+                 QUERY PLAN                 
+--------------------------------------------
+ Incremental Sort
+   Sort Key: empno, (row_number() OVER (?))
+   Presorted Key: empno
+   ->  WindowAgg
+         ->  Sort
+               Sort Key: empno
+               ->  Seq Scan on empsalary
+(7 rows)
+
 RESET enable_hashagg;
 -- Test Sort node reordering
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..6c7de48a2b 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,6 +1324,120 @@ SELECT DISTINCT
 FROM empsalary
 ORDER BY depname, empno;
 
+CREATE INDEX empsalary_depname_idx ON empsalary(depname);
+SET enable_seqscan TO off;
+
+-- Ensure we don't push the ORDER BY sort below the WindowAgg when the input
+-- node to the WindowAgg is already sorted correctly.  This would just move
+-- the sort down and not reduce the number of sorts.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+DROP INDEX empsalary_depname_idx;
+RESET enable_seqscan;
+
+-- Ensure we push the ORDER BY sort below the WindowAgg
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- A more complex permutation of the above.  Ensure the ORDER BY sort is
+-- pushed down below the WindowAgg when there is a PARTITION BY and ORDER BY
+-- clause in the WindowFunc
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Try more complex permutations of pushing the ORDER BY's sort below the
+-- WindowAgg.  With multiple WindowAggs sharing the same sort, ensure we push
+-- the sort to the correct level.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+       sum(salary) OVER (PARTITION BY depname) depsalary,
+       count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- As above, but swap the order of the WindowFuncs to ensure their order is
+-- not a factor in the plan choice.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       count(*) OVER (ORDER BY enroll_date DESC) c,
+       sum(salary) OVER (PARTITION BY depname) depsalary,
+       min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Ensure that we don't push the ORDER BY's sort down for the following cases.
+-- See the comment in create_one_window_path() for details of each case that
+-- we don't perform the optimization.
+
+-- Case #1: Don't push the ORDER BY's sort down when there's a DISTINCT clause
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+       depname,
+       min(salary) OVER (PARTITION BY depname) depminsalary,
+       sum(salary) OVER (PARTITION BY depname) depsalary,
+       count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: Don't push the ORDER BY's sort down when there's a LIMIT clause
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       depname,
+       min(salary) OVER (PARTITION BY empno) depminsalary,
+       sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname
+LIMIT 1;
+
+-- Case #3: Don't push the ORDER BY's sort down when the top-level WindowAgg
+-- has a runCondition.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+  SELECT depname,
+         empno,
+         row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+         sum(salary) OVER (PARTITION BY depname ORDER BY empno) depsalary
+  FROM empsalary
+  ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn < 3;
+
+-- Try a positive version of case #3 where the runCondition exists but it's
+-- not on the final WindowAgg.  Ensure the sort is pushed below the WindowAgg
+-- in this case.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+  SELECT depname,
+         empno,
+         row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+         row_number() OVER (PARTITION BY depname ORDER BY empno) emprn2
+  FROM empsalary
+  ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn2 < 3;
+
+-- Case #4: Don't push the ORDER BY's sort down when the ORDER BY contains
+-- WindowFunc results
+EXPLAIN (COSTS OFF)
+SELECT empno,
+       row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
 RESET enable_hashagg;
 
 -- Test Sort node reordering

Reply via email to