Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

via GitHub Sat, 05 Apr 2025 11:32:44 -0700


Dandandan commented on issue #15383:
URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2767253437


   > I'm considering another approach. Maybe I shouldn't use 
filter_record_batch 🤔. It filters the all column iteratly. I should filter the 
row when the accumulator merge_batch 🤔
   
   Yes, doing so will copy the entire batch (which is what we try to avoid) and 
will be slower than `take` (in the end it will do the same).
   I think what we probably want is to get the indices via 
https://docs.rs/arrow/latest/arrow/buffer/struct.BooleanBuffer.html#method.set_indices
 so it only will aggregate the values for those indices.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

Reply via email to