jonathanc-n commented on issue #17267:
URL: https://github.com/apache/datafusion/issues/17267#issuecomment-3290012901
@alamb @2010YOUY01 There is a problem with the HJ -> SMJ runtime switch. I
might be unaware but there isn't a way to enforce sort and repartition at
runtime.
The solution I was thinking of was to:
- During physical planner, pass in a flag for whether to allow for the
switch from HJ to SMJ during runtime.
- Hash join will have required input ordering enforce a sort if the flag
above is true
- If hash join detects a spill during `collect_left_input`, it will spill
and let SMJ pick up the work. If no spill is detected, then it will continue
as usual.
This solution doesn't make too much sense since we would need to pay the
cost of sorting + the regular hash join execution if there is no need to spill.
We should just specify to users that the prefer_hash_join flag should be false
if the user needs spilling for joins.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]