rluvaton opened a new issue, #14991:
URL: https://github.com/apache/datafusion/issues/14991

   Let's take for example the following plan:
   ```
   Project
   Aggregate
   Sort
   Project
   Scan
   ```
   
   and the sort is on the aggregate expressions.
   
   If the `GroupsAccumulator` knows that the groups are sorted a lot of 
optimizations can be done.
   
   For example we can avoid the reordering here:
   
https://github.com/apache/datafusion/blob/2fcab2ef0da474ec000d7410427b9d18afb5820b/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L238-L241
   
   Or implementation for `GroupsAccumulator` for `ArrayAgg` could avoid concat 
and instead just slice each group or save the offsets for start and end group 
and save the group values.
   
   And custom implementations of `GroupsAccumulator` could take advantage of 
that by:
   1. knowing if got a new group index we will not get it again until that 
group values is requested (`state`/`evaluate`)
   2. Saving intermediate state for all groups in a single continues vector 
allowing us to avoid random access when trying to build the output for 
`state`/`evaluate`.
   
   
   Also it looks like we already have this information in the `AggregateExec` 
https://github.com/apache/datafusion/blob/2fcab2ef0da474ec000d7410427b9d18afb5820b/datafusion/physical-plan/src/aggregates/mod.rs#L393-L394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to