rluvaton opened a new issue, #14991: URL: https://github.com/apache/datafusion/issues/14991
Let's take for example the following plan: ``` Project Aggregate Sort Project Scan ``` and the sort is on the aggregate expressions. If the `GroupsAccumulator` knows that the groups are sorted a lot of optimizations can be done. For example we can avoid the reordering here: https://github.com/apache/datafusion/blob/2fcab2ef0da474ec000d7410427b9d18afb5820b/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L238-L241 Or implementation for `GroupsAccumulator` for `ArrayAgg` could avoid concat and instead just slice each group or save the offsets for start and end group and save the group values. And custom implementations of `GroupsAccumulator` could take advantage of that by: 1. knowing if got a new group index we will not get it again until that group values is requested (`state`/`evaluate`) 2. Saving intermediate state for all groups in a single continues vector allowing us to avoid random access when trying to build the output for `state`/`evaluate`. Also it looks like we already have this information in the `AggregateExec` https://github.com/apache/datafusion/blob/2fcab2ef0da474ec000d7410427b9d18afb5820b/datafusion/physical-plan/src/aggregates/mod.rs#L393-L394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org