alamb commented on code in PR #16625: URL: https://github.com/apache/datafusion/pull/16625#discussion_r2228413719
########## datafusion/physical-plan/src/aggregates/mod.rs: ########## @@ -1071,13 +1072,26 @@ fn get_aggregate_expr_req( aggr_expr: &AggregateFunctionExpr, group_by: &PhysicalGroupBy, agg_mode: &AggregateMode, + include_soft_requirement: bool, Review Comment: can we please update the documentation of this function to include a description of what `include_soft_requirement` means? ########## datafusion/functions-aggregate-common/src/order.rs: ########## @@ -25,6 +25,10 @@ pub enum AggregateOrderSensitivity { /// The aggregator can not produce a correct result unless its ordering /// requirement is satisfied. HardRequirement, + /// Indicates that the aggregate expression strongly prefers the input to be ordered. + /// The aggregator can produce its result correctly regardless of the input ordering, + /// This is a similar to, but stronger than, `Beneficial`. Review Comment: I do think it is critical to capture the implications of SoftRequirement in comments for future readers. Here is another proposal: ```suggestion /// Indicates that the aggregator is more efficient when the input is ordered /// but can still produce its result correctly regardless of the input ordering. /// This is a similar to, but stronger than, [`Self::Beneficial`]. /// /// Similarly to [`Self::HardRequirement`], when possible DataFusion will insert /// a `SortExec`, to reorder the input to match the SoftRequirement. However, /// when such a `SortExec` cannot be inserted, (for example, due to conflicting /// [`Self::HardRequirements`] with other ordered aggregates in the query), /// the aggregate function will still execute, without the preferred order, unlike with /// with [`Self::HardRequirement`] ``` ########## datafusion/physical-plan/src/aggregates/mod.rs: ########## @@ -1142,60 +1156,73 @@ pub fn get_finer_aggregate_exprs_requirement( agg_mode: &AggregateMode, ) -> Result<Vec<PhysicalSortRequirement>> { let mut requirement = None; - for aggr_expr in aggr_exprs.iter_mut() { - let Some(aggr_req) = get_aggregate_expr_req(aggr_expr, group_by, agg_mode) - .and_then(|o| eq_properties.normalize_sort_exprs(o)) - else { - // There is no aggregate ordering requirement, or it is trivially - // satisfied -- we can skip this expression. - continue; - }; - // If the common requirement is finer than the current expression's, - // we can skip this expression. If the latter is finer than the former, - // adopt it if it is satisfied by the equivalence properties. Otherwise, - // defer the analysis to the reverse expression. - let forward_finer = determine_finer(&requirement, &aggr_req); - if let Some(finer) = forward_finer { - if !finer { - continue; - } else if eq_properties.ordering_satisfy(aggr_req.clone())? { - requirement = Some(aggr_req); - continue; - } - } - if let Some(reverse_aggr_expr) = aggr_expr.reverse_expr() { - let Some(rev_aggr_req) = - get_aggregate_expr_req(&reverse_aggr_expr, group_by, agg_mode) - .and_then(|o| eq_properties.normalize_sort_exprs(o)) - else { - // The reverse requirement is trivially satisfied -- just reverse - // the expression and continue with the next one: - *aggr_expr = Arc::new(reverse_aggr_expr); + + for include_soft_requirement in [false, true] { Review Comment: I think some context here would help fugure readers. Something like ```suggestion // First try and find a match for all hard and soft requirements. // If a match can't be found, try a second time just matching hard // requirements. for include_soft_requirement in [false, true] { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org