[I] [DISCUSSION]: Unified approach for joins to output batches close to `batch_size` [datafusion]

via GitHub Wed, 22 Jan 2025 08:27:44 -0800


comphead opened a new issue, #14238:
URL: https://github.com/apache/datafusion/issues/14238


                 `BatchCoalescer` is not used in joins yet, since 
CoalesceBatchesExec appears after the joins having filter, in case of the 
output batches might have a  lower row count than target batch size. So, why 
cannot we follow the same pattern in SMJ? If collecting batches in the join 
itself is more performant, then we should also refactor the other joins as well?
   
   On the other hand, `BatchSplitter` is used in other joins, and SMJ could 
(should) have it too, as there is no other way of splitting the batches 
according to target batch size.
   
   _Originally posted by @berkaysynnada in 
https://github.com/apache/datafusion/issues/14160#issuecomment-2601723615_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [DISCUSSION]: Unified approach for joins to output batches close to `batch_size` [datafusion]

Reply via email to