alamb commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1936342248
########## datafusion/sqllogictest/test_files/window.slt: ########## @@ -5452,3 +5452,89 @@ order by c1, c2, rank1, rank2; statement ok drop table t1; + + +# Set-Monotonic Window Aggregate functions can output results in order +statement ok +CREATE EXTERNAL TABLE aggregate_test_100_ordered ( + c1 VARCHAR NOT NULL, + c2 TINYINT NOT NULL, + c3 SMALLINT NOT NULL, + c4 SMALLINT, + c5 INT, + c6 BIGINT NOT NULL, + c7 SMALLINT NOT NULL, + c8 INT NOT NULL, + c9 INT UNSIGNED NOT NULL, + c10 BIGINT UNSIGNED NOT NULL, + c11 FLOAT NOT NULL, + c12 DOUBLE NOT NULL, + c13 VARCHAR NOT NULL +) +STORED AS CSV +LOCATION '../../testing/data/csv/aggregate_test_100.csv' +WITH ORDER (c1) +OPTIONS ('format.has_header' 'true'); + +statement ok +set datafusion.optimizer.prefer_existing_sort = true; + +query TT +EXPLAIN SELECT c1, SUM(c9) OVER(PARTITION BY c1) as sum_c9 FROM aggregate_test_100_ordered ORDER BY c1, sum_c9; Review Comment: This query should not depend on the particular aggregate to avoid the sort I don't think (because it only has a `PARTITION BY` not an `ORDER BY` clause The plan looks fine to me, but the comments imply this is testing something related to the set monotonic aggregate functions Likewise for the query below with `OVER()` ########## datafusion/sqllogictest/test_files/window.slt: ########## @@ -3127,9 +3127,9 @@ physical_plan 04)------BoundedWindowAggExec: wdw=[row_number() ORDER BY [annotated_data_infinite2.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Ok(Field { name: "row_number() ORDER BY [annotated_data_infinite2.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: UInt64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(Int32(NULL)), end_bound: CurrentRow, is_causal: false }], mode=[Sorted] 05)--------StreamingTableExec: partition_sizes=1, projection=[a0, a, b, c, d], infinite_source=true, output_ordering=[a@1 ASC NULLS LAST, b@2 ASC NULLS LAST, c@3 ASC NULLS LAST] -# this is a negative test for asserting that window functions (other than ROW_NUMBER) -# are not added to ordering equivalence -# physical plan should contain SortExec. +# Top level sort is pushed down through BoundedWindowAggExec as its SUM result does already satisfy the required +# global order. The existing sort is for the second-term lexicographical ordering requirement, which is being +# preserved also at lexicographical level during the BoundedWindowAggExec. query TT EXPLAIN SELECT c9, sum1 FROM (SELECT c9, SUM(c9) OVER(ORDER BY c9 DESC) as sum1 Review Comment: I see -- the fact that each subsequent value in the window here has additional values added to to it and Sum is increasing means the data is still sorted that way 👍 ########## datafusion/sqllogictest/test_files/aggregate.slt: ########## @@ -6203,3 +6203,20 @@ physical_plan 14)--------------PlaceholderRowExec 15)------------ProjectionExec: expr=[1 as id, 2 as foo] 16)--------------PlaceholderRowExec + +# SortExec is removed if it is coming after one-row producing AggregateExec's having an empty group by expression Review Comment: 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org