vbarua commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1976134261
########## datafusion/core/tests/dataframe/dataframe_functions.rs: ########## @@ -360,14 +360,15 @@ async fn test_fn_approx_median() -> Result<()> { #[tokio::test] async fn test_fn_approx_percentile_cont() -> Result<()> { - let expr = approx_percentile_cont(col("b"), lit(0.5), None); + let expr = + approx_percentile_cont(Some(vec![col("b").sort(true, false)]), lit(0.5), None); let expected = [ - "+---------------------------------------------+", - "| approx_percentile_cont(test.b,Float64(0.5)) |", - "+---------------------------------------------+", - "| 10 |", - "+---------------------------------------------+", + "+----------------------------------------------------------------------------------+", + "| approx_percentile_cont(test.b,Float64(0.5)) WITHIN GROUP [test.b ASC NULLS LAST] |", Review Comment: This name looks wrong to me. Shouldn't the `test.b` argument be only in the WITHIN GROUP section like ``` approx_percentile_cont(Float64(0.5)) WITHIN GROUP [test.b ASC NULLS LAST] ``` ########## datafusion/expr/src/udaf.rs: ########## @@ -845,6 +861,19 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { ScalarValue::try_from(data_type) } + /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause, return true + /// If the function does not, return false + /// Otherwise return None (the default) + fn supports_null_handling_clause(&self) -> Option<bool> { + None + } + + /// If this function is ordered-set aggregate function, return true + /// If the function is not, return false + fn is_ordered_set_aggregate(&self) -> Option<bool> { + None Review Comment: Would it make sense for this to just return `true` if is an ordered-set aggregate function and `false` otherwise and avoid the Option entirely? ########## datafusion/expr/src/udaf.rs: ########## @@ -845,6 +861,19 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { ScalarValue::try_from(data_type) } + /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause, return true + /// If the function does not, return false + /// Otherwise return None (the default) + fn supports_null_handling_clause(&self) -> Option<bool> { + None + } Review Comment: Is this something we need? From what I know, there aren't any aggregate functions that have options for null handling. At the moment, the 2 overrides you have of this both return `Some(false)`, which is what I would consider the default value anyways. Speaking of which, if we do need this, do we need to return an `Optional<bool>` or could we just return `bool` directly? ########## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ########## @@ -51,29 +52,43 @@ create_func!(ApproxPercentileCont, approx_percentile_cont_udaf); /// Computes the approximate percentile continuous of a set of numbers pub fn approx_percentile_cont( - expression: Expr, + within_group: Option<Vec<Sort>>, Review Comment: Good call going from `Expr` to `Sort` here, as this lets us capture ASC/DESC which does appear to be valid in some engines (eg. [SQLServer](https://learn.microsoft.com/en-us/sql/t-sql/functions/approx-percentile-cont-transact-sql?view=sql-server-ver16)). It would probably be good to add a test for this if there isn't already one. Separately, why is this `Option<Vec<Sort>>`? As I understand it `approx_percentile_cont` must always have a WITHIN GROUP clause with a single ordering expression, so I would expect this to just be `Sort` to reflect that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org