ctsk commented on issue #16206: URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2994380522
I think the issue on the take+concat pattern is only tangentially related to this issue. Ultimately, you would need a version of the take operation, that does not simply clone the vector of data buffers, but rather eagerly allocates a new data buffer to hold the string view data. This modification could be applied to the current compute::take kernel too. The interleave kernel already does something like this: It only takes references to the data buffers that are actually referenced in the new array - that is why it would help somewhat alleviate this issue. If the build side is *really large*, this could still mean allocating a vector of 1000+ buffer references. The subsequent gc (in CoalesceBatchesExec) drops the vector and allocate a new data buffer. Instead, I think there are 2 direct solutions for this: 1. Extend the take (or interleave) kernels to eagerly allocate new data buffers for string views (i.e. add a `compute::TakeOption` - unfortunately also a breaking change). 2. Switch from `Vec<Buffer>` to `Arc<[Buffer]>` - also a breaking change in the arrow crate. This would not require writing new kernel-style code - the take/interleave operations would remain unchanged. My PR on executing a GC on the hash join build-side is unfortunately not a good solution: This GC operation is expensive, and whether it pays off depends on the number of probe side batches - which is not known in advance. The benchmark shows that it significantly slows down smaller joins. It would also not prevent the CoalesceBatchesExec after the HJ from issuing another GC. For now, the best way to fix this issue is to work on getting https://github.com/apache/arrow-rs/pull/6427 merged. As far as I can tell, it is already fully implemented and got stuck due to lack of time of the author to add the `ViewBuffers` abstraction. Lastly, I believe that using interleave, or a good solution to the take+concat pattern would be very helpful in making the HJ more efficient in general. Most importantly, it could allow avoiding the concat on the build-side batches, and remove the CoalesceBatchesExec after the join. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org