alamb commented on code in PR #14271:
URL: https://github.com/apache/datafusion/pull/14271#discussion_r1936342248


##########
datafusion/sqllogictest/test_files/window.slt:
##########
@@ -5452,3 +5452,89 @@ order by c1, c2, rank1, rank2;
 
 statement ok
 drop table t1;
+
+
+# Set-Monotonic Window Aggregate functions can output results in order
+statement ok
+CREATE EXTERNAL TABLE aggregate_test_100_ordered (
+  c1  VARCHAR NOT NULL,
+  c2  TINYINT NOT NULL,
+  c3  SMALLINT NOT NULL,
+  c4  SMALLINT,
+  c5  INT,
+  c6  BIGINT NOT NULL,
+  c7  SMALLINT NOT NULL,
+  c8  INT NOT NULL,
+  c9  INT UNSIGNED NOT NULL,
+  c10 BIGINT UNSIGNED NOT NULL,
+  c11 FLOAT NOT NULL,
+  c12 DOUBLE NOT NULL,
+  c13 VARCHAR NOT NULL
+)
+STORED AS CSV
+LOCATION '../../testing/data/csv/aggregate_test_100.csv'
+WITH ORDER (c1)
+OPTIONS ('format.has_header' 'true');
+
+statement ok
+set datafusion.optimizer.prefer_existing_sort = true;
+
+query TT
+EXPLAIN SELECT c1, SUM(c9) OVER(PARTITION BY c1) as sum_c9 FROM 
aggregate_test_100_ordered ORDER BY c1, sum_c9;

Review Comment:
   This query should not depend on the particular aggregate to avoid the sort I 
don't think (because it only has a `PARTITION BY` not an `ORDER BY` clause
   
   The plan looks fine to me, but the comments imply this is testing something 
related to the set monotonic aggregate functions
   
   Likewise for the query below with `OVER()`



##########
datafusion/sqllogictest/test_files/window.slt:
##########
@@ -3127,9 +3127,9 @@ physical_plan
 04)------BoundedWindowAggExec: wdw=[row_number() ORDER BY 
[annotated_data_infinite2.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW: Ok(Field { name: "row_number() ORDER BY 
[annotated_data_infinite2.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW", data_type: UInt64, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, 
start_bound: Preceding(Int32(NULL)), end_bound: CurrentRow, is_causal: false 
}], mode=[Sorted]
 05)--------StreamingTableExec: partition_sizes=1, projection=[a0, a, b, c, d], 
infinite_source=true, output_ordering=[a@1 ASC NULLS LAST, b@2 ASC NULLS LAST, 
c@3 ASC NULLS LAST]
 
-# this is a negative test for asserting that window functions (other than 
ROW_NUMBER)
-# are not added to ordering equivalence
-# physical plan should contain SortExec.
+# Top level sort is pushed down through BoundedWindowAggExec as its SUM result 
does already satisfy the required
+# global order. The existing sort is for the second-term lexicographical 
ordering requirement, which is being
+# preserved also at lexicographical level during the BoundedWindowAggExec.
 query TT
 EXPLAIN SELECT c9, sum1 FROM (SELECT c9,
                        SUM(c9) OVER(ORDER BY c9 DESC) as sum1

Review Comment:
   I see -- the fact that each subsequent value in the window here has 
additional values added to to it and Sum is increasing means the data is still 
sorted that way 👍 



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -6203,3 +6203,20 @@ physical_plan
 14)--------------PlaceholderRowExec
 15)------------ProjectionExec: expr=[1 as id, 2 as foo]
 16)--------------PlaceholderRowExec
+
+# SortExec is removed if it is coming after one-row producing AggregateExec's 
having an empty group by expression

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to