alamb commented on code in PR #14271:
URL: https://github.com/apache/datafusion/pull/14271#discussion_r1934314846


##########
datafusion/functions-aggregate/src/min_max.rs:
##########
@@ -1183,6 +1187,10 @@ impl AggregateUDFImpl for Min {
     fn documentation(&self) -> Option<&Documentation> {
         self.doc()
     }
+
+    fn set_monotonicity(&self, _data_type: &DataType) -> 
AggregateExprSetMonotonicity {

Review Comment:
   ```suggestion
       fn set_monotonicity(&self, _data_type: &DataType) -> 
AggregateExprSetMonotonicity {
           // min is monotonically descending as it always decreases or
           // stays the same as new values are seen
   ```



##########
datafusion/expr/src/udaf.rs:
##########
@@ -635,6 +635,14 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
     fn documentation(&self) -> Option<&Documentation> {
         None
     }
+
+    /// Indicates whether the aggregation function is monotonic as a set 
function. A set
+    /// function is monotonically increasing if its value increases as its 
argument grows
+    /// (as a set). Formally, `f` is a monotonically increasing set function 
if `f(S) >= f(T)`
+    /// whenever `S` is a superset of `T`.

Review Comment:
   I think it would help this documentation if it also gave some examples
   
   ```suggestion
       /// whenever `S` is a superset of `T`.
       ///
       /// For example `count` and `max` are monotonically ascending as their 
values always increase
       /// (or stay the same) as new values are seen. 
       ///
       /// `min` is monotonically descending as its value always decreases or
       /// stays the same as new values are seen. 
   ```



##########
datafusion/expr/src/udaf.rs:
##########
@@ -818,6 +826,26 @@ pub mod aggregate_doc_sections {
     };
 }
 
+/// Status of an Aggregate Expression's Set Monotonicity
+#[derive(Debug, Clone)]
+pub enum AggregateExprSetMonotonicity {
+    /// Ordering exists as ascending

Review Comment:
   I think it would help to explain what ascending means here 
   
   Something like: 
   ```suggestion
       /// Ordering exists as ascending: value of the aggregate always increases
       /// (or stay the same) as new values are seen. 
   ```



##########
datafusion/expr/src/udaf.rs:
##########
@@ -635,6 +635,14 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
     fn documentation(&self) -> Option<&Documentation> {
         None
     }
+
+    /// Indicates whether the aggregation function is monotonic as a set 
function. A set
+    /// function is monotonically increasing if its value increases as its 
argument grows
+    /// (as a set). Formally, `f` is a monotonically increasing set function 
if `f(S) >= f(T)`

Review Comment:
   I think it would also help to copy the formal definition on to 
`AggregateExprSetMonotonicity` as well to make it more likely someone will find 
this



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -6203,3 +6203,97 @@ physical_plan
 14)--------------PlaceholderRowExec
 15)------------ProjectionExec: expr=[1 as id, 2 as foo]
 16)--------------PlaceholderRowExec
+
+
+# Set-Monotonic Aggregate functions can output results in order
+statement ok
+CREATE EXTERNAL TABLE aggregate_test_100_ordered (
+  c1  VARCHAR NOT NULL,
+  c2  TINYINT NOT NULL,
+  c3  SMALLINT NOT NULL,
+  c4  SMALLINT,
+  c5  INT,
+  c6  BIGINT NOT NULL,
+  c7  SMALLINT NOT NULL,
+  c8  INT NOT NULL,
+  c9  INT UNSIGNED NOT NULL,
+  c10 BIGINT UNSIGNED NOT NULL,
+  c11 FLOAT NOT NULL,
+  c12 DOUBLE NOT NULL,
+  c13 VARCHAR NOT NULL
+)
+STORED AS CSV
+LOCATION '../../testing/data/csv/aggregate_test_100.csv'
+WITH ORDER (c1)
+OPTIONS ('format.has_header' 'true');
+
+statement ok
+set datafusion.optimizer.prefer_existing_sort = true;
+
+query TT
+EXPLAIN SELECT c1, SUM(c9) as sum_c9 FROM aggregate_test_100_ordered GROUP BY 
c1 ORDER BY c1, sum_c9;
+----
+logical_plan
+01)Sort: aggregate_test_100_ordered.c1 ASC NULLS LAST, sum_c9 ASC NULLS LAST
+02)--Projection: aggregate_test_100_ordered.c1, 
sum(aggregate_test_100_ordered.c9) AS sum_c9
+03)----Aggregate: groupBy=[[aggregate_test_100_ordered.c1]], 
aggr=[[sum(CAST(aggregate_test_100_ordered.c9 AS UInt64))]]
+04)------TableScan: aggregate_test_100_ordered projection=[c1, c9]
+physical_plan
+01)SortPreservingMergeExec: [c1@0 ASC NULLS LAST, sum_c9@1 ASC NULLS LAST]
+02)--ProjectionExec: expr=[c1@0 as c1, sum(aggregate_test_100_ordered.c9)@1 as 
sum_c9]
+03)----AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], 
aggr=[sum(aggregate_test_100_ordered.c9)], ordering_mode=Sorted

Review Comment:
   I don't understand this plan
   
   It seem to say that if we know the data is sorted by `c1` (only) then we 
also know the data is sorted by `c1, sum(c9)`
   
   But that isn't always true, for example
   
   
   | c1 | c9 |
   |--------|--------|
   | 1 | 100 |
   | 1 | 200 |
   | 2 | 10 |
   | 2 | 20 |
   | 2 | 30 | 
   
   The result is not ordered the same (it would need to be reordered)
   
   | c1 | sum(c9) |
   |--------|--------|
   | 1 | 300 |
   | 2 | 60 |
   



##########
datafusion/functions-aggregate/src/min_max.rs:
##########
@@ -361,6 +361,10 @@ impl AggregateUDFImpl for Max {
     fn documentation(&self) -> Option<&Documentation> {
         self.doc()
     }
+
+    fn set_monotonicity(&self, _data_type: &DataType) -> 
AggregateExprSetMonotonicity {
+        AggregateExprSetMonotonicity::MonotonicallyAscending

Review Comment:
   ```suggestion
           // max is monotonically ascending as it always increases or
           // stays the same as new values are seen
           AggregateExprSetMonotonicity::MonotonicallyAscending
   ```



##########
datafusion/functions-aggregate/src/sum.rs:
##########
@@ -254,6 +254,18 @@ impl AggregateUDFImpl for Sum {
     fn documentation(&self) -> Option<&Documentation> {
         self.doc()
     }
+
+    fn set_monotonicity(&self, data_type: &DataType) -> 
AggregateExprSetMonotonicity {
+        // Sum is only monotonic if its input is unsigned

Review Comment:
   👍 



##########
datafusion/expr/src/udaf.rs:
##########
@@ -818,6 +826,26 @@ pub mod aggregate_doc_sections {
     };
 }
 
+/// Status of an Aggregate Expression's Set Monotonicity
+#[derive(Debug, Clone)]
+pub enum AggregateExprSetMonotonicity {
+    /// Ordering exists as ascending
+    MonotonicallyAscending,
+    /// Ordering exists as descending

Review Comment:
   Likewise here
   
   ```suggestion
       /// Ordering exists as descending: value of the aggregate always 
decreases
       /// (or stay the same) as new values are seen. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to