rluvaton commented on issue #14991: URL: https://github.com/apache/datafusion/issues/14991#issuecomment-2698202576
Actually it uses GroupAccumulator even if it is fully sorted. you can see by adding breakpoint to https://github.com/apache/datafusion/blob/ac79ef3442e65f6197c7234da9fad964895b9101/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs#L118 and run the following `slt`: ```slt statement ok CREATE TABLE test_table ( col_i32 INT, col_u32 INT UNSIGNED ) as VALUES ( NULL, NULL), ( -2147483648, 0), ( -2147483648, 0), ( 100, 100), ( 2147483647, 4294967295), ( NULL, NULL), ( -2147483648, 0), ( -2147483648, 0), ( 100, 100), ( 2147483646, 4294967294), ( 2147483647, 4294967295 ) query II select col_i32, sum(col_u32) sum_col_u32 from (select * from test_table order by col_i32 limit 10) group by col_i32 ---- 2147483647 8589934590 -2147483648 0 100 200 2147483646 4294967294 NULL NULL ``` you will see that even though `InputOrderMode` is `Sorted` the `GroupAccumulator` is still used. (I think we should be using group accumulator for sorted or partial sorted data to avoid combining all scalars from `Accumulator`s to array -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org