vbarua commented on code in PR #13511:
URL: https://github.com/apache/datafusion/pull/13511#discussion_r1976134261


##########
datafusion/core/tests/dataframe/dataframe_functions.rs:
##########
@@ -360,14 +360,15 @@ async fn test_fn_approx_median() -> Result<()> {
 
 #[tokio::test]
 async fn test_fn_approx_percentile_cont() -> Result<()> {
-    let expr = approx_percentile_cont(col("b"), lit(0.5), None);
+    let expr =
+        approx_percentile_cont(Some(vec![col("b").sort(true, false)]), 
lit(0.5), None);
 
     let expected = [
-        "+---------------------------------------------+",
-        "| approx_percentile_cont(test.b,Float64(0.5)) |",
-        "+---------------------------------------------+",
-        "| 10                                          |",
-        "+---------------------------------------------+",
+        
"+----------------------------------------------------------------------------------+",
+        "| approx_percentile_cont(test.b,Float64(0.5)) WITHIN GROUP [test.b 
ASC NULLS LAST] |",

Review Comment:
   This name looks wrong to me. Shouldn't the `test.b` argument be only in the 
WITHIN GROUP section like
   ```
   approx_percentile_cont(Float64(0.5)) WITHIN GROUP [test.b ASC NULLS LAST]
   ```



##########
datafusion/expr/src/udaf.rs:
##########
@@ -845,6 +861,19 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
         ScalarValue::try_from(data_type)
     }
 
+    /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause, 
return true
+    /// If the function does not, return false
+    /// Otherwise return None (the default)
+    fn supports_null_handling_clause(&self) -> Option<bool> {
+        None
+    }
+
+    /// If this function is ordered-set aggregate function, return true
+    /// If the function is not, return false
+    fn is_ordered_set_aggregate(&self) -> Option<bool> {
+        None

Review Comment:
   Would it make sense for this to just return `true` if is an ordered-set 
aggregate function and `false` otherwise and avoid the Option entirely?



##########
datafusion/expr/src/udaf.rs:
##########
@@ -845,6 +861,19 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
         ScalarValue::try_from(data_type)
     }
 
+    /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause, 
return true
+    /// If the function does not, return false
+    /// Otherwise return None (the default)
+    fn supports_null_handling_clause(&self) -> Option<bool> {
+        None
+    }

Review Comment:
   Is this something we need? From what I know, there aren't any aggregate 
functions that have options for null handling. At the moment, the 2 overrides 
you have of this both return `Some(false)`, which is what I would consider the 
default value anyways.
   
   Speaking of which, if we do need this, do we need to return an 
`Optional<bool>` or could we just return `bool` directly?



##########
datafusion/functions-aggregate/src/approx_percentile_cont.rs:
##########
@@ -51,29 +52,43 @@ create_func!(ApproxPercentileCont, 
approx_percentile_cont_udaf);
 
 /// Computes the approximate percentile continuous of a set of numbers
 pub fn approx_percentile_cont(
-    expression: Expr,
+    within_group: Option<Vec<Sort>>,

Review Comment:
   Good call going from `Expr` to `Sort` here, as this lets us capture ASC/DESC 
which does appear to be valid in some engines (eg. 
[SQLServer](https://learn.microsoft.com/en-us/sql/t-sql/functions/approx-percentile-cont-transact-sql?view=sql-server-ver16)).
 It would probably be good to add a test for this if there isn't already one.
   
   Separately, why is this `Option<Vec<Sort>>`? As I understand it 
`approx_percentile_cont` must always have a WITHIN GROUP clause with a single 
ordering expression, so I would expect this to just be `Sort` to reflect that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to