ctsk closed pull request #14234: Perform hashing in CollectLeft HashJoin in
parallel
URL: https://github.com/apache/datafusion/pull/14234
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
ctsk commented on PR #14234:
URL: https://github.com/apache/datafusion/pull/14234#issuecomment-2614112816
I plan to test this again with a larger TPCH scale factor, and compare
collectLeft (parallel hashing) vs collectLeft (main branch) vs repartition joins
- On SF=1, collectLeft alrea
ctsk opened a new pull request, #14234:
URL: https://github.com/apache/datafusion/pull/14234
## Which issue does this PR close?
This PR is an experiment to perform the hashing part of CollectLeft joins in
parallel. Instead of directly coalescing the partitions on the build sid