alamb commented on code in PR #14271:
URL: https://github.com/apache/datafusion/pull/14271#discussion_r1934326726
##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -6203,3 +6203,97 @@ physical_plan
14)--------------PlaceholderRowExec
15)------------ProjectionExec: expr=[1 as id, 2 as foo]
16)--------------PlaceholderRowExec
+
+
+# Set-Monotonic Aggregate functions can output results in order
+statement ok
+CREATE EXTERNAL TABLE aggregate_test_100_ordered (
+ c1 VARCHAR NOT NULL,
+ c2 TINYINT NOT NULL,
+ c3 SMALLINT NOT NULL,
+ c4 SMALLINT,
+ c5 INT,
+ c6 BIGINT NOT NULL,
+ c7 SMALLINT NOT NULL,
+ c8 INT NOT NULL,
+ c9 INT UNSIGNED NOT NULL,
+ c10 BIGINT UNSIGNED NOT NULL,
+ c11 FLOAT NOT NULL,
+ c12 DOUBLE NOT NULL,
+ c13 VARCHAR NOT NULL
+)
+STORED AS CSV
+LOCATION '../../testing/data/csv/aggregate_test_100.csv'
+WITH ORDER (c1)
+OPTIONS ('format.has_header' 'true');
+
+statement ok
+set datafusion.optimizer.prefer_existing_sort = true;
+
+query TT
+EXPLAIN SELECT c1, SUM(c9) as sum_c9 FROM aggregate_test_100_ordered GROUP BY
c1 ORDER BY c1, sum_c9;
+----
+logical_plan
+01)Sort: aggregate_test_100_ordered.c1 ASC NULLS LAST, sum_c9 ASC NULLS LAST
+02)--Projection: aggregate_test_100_ordered.c1,
sum(aggregate_test_100_ordered.c9) AS sum_c9
+03)----Aggregate: groupBy=[[aggregate_test_100_ordered.c1]],
aggr=[[sum(CAST(aggregate_test_100_ordered.c9 AS UInt64))]]
+04)------TableScan: aggregate_test_100_ordered projection=[c1, c9]
+physical_plan
+01)SortPreservingMergeExec: [c1@0 ASC NULLS LAST, sum_c9@1 ASC NULLS LAST]
+02)--ProjectionExec: expr=[c1@0 as c1, sum(aggregate_test_100_ordered.c9)@1 as
sum_c9]
+03)----AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1],
aggr=[sum(aggregate_test_100_ordered.c9)], ordering_mode=Sorted
Review Comment:
I don't understand this plan
It seem to say that if we know the data is sorted by `c1` (only) then after
a grouping that preserves the `c1` sort (`ordering_mode=Sorted`) we also know
the data is sorted by `c1, sum(c9)`
But that isn't always true, for example
| c1 | c9 |
|--------|--------|
| 1 | 100 |
| 1 | 200 |
| 2 | 10 |
| 2 | 20 |
| 2 | 30 |
The result is not ordered the same (it would need to be reordered)
| c1 | sum(c9) |
|--------|--------|
| 1 | 300 |
| 2 | 60 |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]