vbarua commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1972589737
########## datafusion/core/src/physical_planner.rs: ########## @@ -1619,13 +1620,20 @@ pub fn create_aggregate_expr_with_name_and_maybe_filter( == NullTreatment::IgnoreNulls; let (agg_expr, filter, order_by) = { - let physical_sort_exprs = match order_by { - Some(exprs) => Some(create_physical_sort_exprs( - exprs, + let physical_sort_exprs = match within_group { + Some(within_group) => Some(create_physical_sort_exprs( + within_group, logical_input_schema, execution_props, )?), - None => None, + None => match order_by { + Some(order_by) => Some(create_physical_sort_exprs( + order_by, + logical_input_schema, + execution_props, + )?), + None => None, + }, Review Comment: As an example of what I mean by the fact `within_groups` is redundant with `order_by` in the internal representation, here they both end up being processed identically. ########## datafusion/expr/src/expr.rs: ########## @@ -724,6 +727,7 @@ impl AggregateFunction { filter, order_by, null_treatment, + within_group, Review Comment: At the SQL level, it makes sense to distinguish between WITHIN GROUP in something like ```sql percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) ``` and ORDER BY in something like ```sql array_agg ( anynonarray ORDER BY input_sort_columns ) ``` but in the internal plan tree it feels redundant to track `order_by` and `within_group` separately. The `WITHIN GROUP (ORDER BY ...)` syntax is just a different way of setting the `order_by`. I think this PR would be vastly simplified if you collapsed `within_group` into `order_by` in the internal tree, so that the only part of the code that cares about the difference is the SQL layer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org