vbarua commented on code in PR #13511:
URL: https://github.com/apache/datafusion/pull/13511#discussion_r1972589737


##########
datafusion/core/src/physical_planner.rs:
##########
@@ -1619,13 +1620,20 @@ pub fn create_aggregate_expr_with_name_and_maybe_filter(
                 == NullTreatment::IgnoreNulls;
 
             let (agg_expr, filter, order_by) = {
-                let physical_sort_exprs = match order_by {
-                    Some(exprs) => Some(create_physical_sort_exprs(
-                        exprs,
+                let physical_sort_exprs = match within_group {
+                    Some(within_group) => Some(create_physical_sort_exprs(
+                        within_group,
                         logical_input_schema,
                         execution_props,
                     )?),
-                    None => None,
+                    None => match order_by {
+                        Some(order_by) => Some(create_physical_sort_exprs(
+                            order_by,
+                            logical_input_schema,
+                            execution_props,
+                        )?),
+                        None => None,
+                    },

Review Comment:
   As an example of what I mean by the fact `within_groups` is redundant with 
`order_by` in the internal representation, here they both end up being 
processed identically.



##########
datafusion/expr/src/expr.rs:
##########
@@ -724,6 +727,7 @@ impl AggregateFunction {
             filter,
             order_by,
             null_treatment,
+            within_group,

Review Comment:
   At the SQL level, it makes sense to distinguish between WITHIN GROUP in 
something like
   ```sql
   percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)
   ```
   and ORDER BY in something like
   ```sql
   array_agg ( anynonarray ORDER BY input_sort_columns ) 
   ```
   but in the internal plan tree it feels redundant to track `order_by` and 
`within_group` separately. The `WITHIN GROUP (ORDER BY ...)` syntax is just a 
different way of setting the `order_by`.
   
   I think this PR would be vastly simplified if you collapsed `within_group` 
into `order_by` in the internal tree, so that the only part of the code that 
cares about the difference is the SQL layer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to