Re: [I] Excessive Arc-clone in HashJoinStream with StringView on build-side [datafusion]

via GitHub Mon, 23 Jun 2025 03:41:53 -0700


ctsk commented on issue #16206:
URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2994380522


   I think the issue on the take+concat pattern is only tangentially related to 
this issue. Ultimately, you would need a version of the take operation, that 
does not simply clone the vector of data buffers, but rather eagerly allocates 
a new data buffer to hold the string view data. This modification could be 
applied to the current compute::take kernel too.
   
   The interleave kernel already does something like this: It only takes 
references to the data buffers that are actually referenced in the new array - 
that is why it would help somewhat alleviate this issue. If the build side is 
*really large*, this could still mean allocating a vector of 1000+ buffer 
references. The subsequent gc (in CoalesceBatchesExec) drops the vector and 
allocate a new data buffer.
   
   Instead, I think there are 2 direct solutions for this:
   1. Extend the take (or interleave) kernels to eagerly allocate new data 
buffers for string views  (i.e. add a `compute::TakeOption` - unfortunately 
also a breaking change).
   2. Switch from `Vec<Buffer>` to `Arc<[Buffer]>` -  also a breaking change in 
the arrow crate. This would not require writing new kernel-style code - the 
take/interleave operations would remain unchanged.
   
   My PR on executing a GC on the hash join build-side is unfortunately not a 
good solution: This GC operation is expensive, and whether it pays off depends 
on the number of probe side batches - which is not known in advance. The 
benchmark shows that it significantly slows down smaller joins. It would also 
not prevent the CoalesceBatchesExec after the HJ from issuing another GC.
   
   For now, the best way to fix this issue is to work on getting 
https://github.com/apache/arrow-rs/pull/6427 merged. As far as I can tell, it 
is already fully implemented and got stuck due to lack of time of the author to 
add the `ViewBuffers` abstraction.
   
   Lastly, I believe that using interleave, or a good solution to the 
take+concat pattern would be very helpful in making the HJ more efficient in 
general. Most importantly, it could allow avoiding the concat on the build-side 
batches, and remove the CoalesceBatchesExec after the join.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Excessive Arc-clone in HashJoinStream with StringView on build-side [datafusion]

Reply via email to