Re: [I] Expose to `AccumulatorArgs` whether all the groups are sorted [datafusion]

via GitHub Tue, 04 Mar 2025 08:14:50 -0800


rluvaton commented on issue #14991:
URL: https://github.com/apache/datafusion/issues/14991#issuecomment-2698202576


   Actually it uses GroupAccumulator even if it is fully sorted.
   
   you can see by adding breakpoint to 
https://github.com/apache/datafusion/blob/ac79ef3442e65f6197c7234da9fad964895b9101/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs#L118
   
   and run the following `slt`:
   ```slt
   
   statement ok
   CREATE TABLE test_table (
       col_i32 INT,
       col_u32 INT UNSIGNED
   ) as VALUES
   ( NULL,        NULL),
   ( -2147483648, 0),
   ( -2147483648, 0),
   ( 100,         100),
   ( 2147483647,  4294967295),
   ( NULL,        NULL),
   ( -2147483648, 0),
   ( -2147483648, 0),
   ( 100,         100),
   ( 2147483646,  4294967294),
   ( 2147483647,  4294967295 )
   
   
   query II
   select col_i32, sum(col_u32) sum_col_u32 from (select * from test_table 
order by col_i32 limit 10) group by col_i32
   ----
   2147483647 8589934590
   -2147483648 0
   100 200
   2147483646 4294967294
   NULL NULL
   
   ```
   
   you will see that even though `InputOrderMode` is `Sorted` the 
`GroupAccumulator` is still used.
   
   (I think we should be using group accumulator for sorted or partial sorted 
data to avoid combining all scalars from `Accumulator`s to array
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Expose to `AccumulatorArgs` whether all the groups are sorted [datafusion]

Reply via email to