Re: [PR] Perform hashing in CollectLeft HashJoin in parallel [datafusion]

2025-03-09 Thread via GitHub
ctsk closed pull request #14234: Perform hashing in CollectLeft HashJoin in parallel URL: https://github.com/apache/datafusion/pull/14234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Perform hashing in CollectLeft HashJoin in parallel [datafusion]

2025-01-25 Thread via GitHub
ctsk commented on PR #14234: URL: https://github.com/apache/datafusion/pull/14234#issuecomment-2614112816 I plan to test this again with a larger TPCH scale factor, and compare collectLeft (parallel hashing) vs collectLeft (main branch) vs repartition joins - On SF=1, collectLeft alrea

[PR] Perform hashing in CollectLeft HashJoin in parallel [datafusion]

2025-01-22 Thread via GitHub
ctsk opened a new pull request, #14234: URL: https://github.com/apache/datafusion/pull/14234 ## Which issue does this PR close? This PR is an experiment to perform the hashing part of CollectLeft joins in parallel. Instead of directly coalescing the partitions on the build sid