alamb commented on code in PR #13560:
URL: https://github.com/apache/datafusion/pull/13560#discussion_r1870587465
##########
datafusion/core/src/physical_optimizer/sort_pushdown.rs:
##########
@@ -606,6 +611,102 @@ fn handle_custom_pushdown(
}
}
+// For hash join we only maintain the input order for the right child
+// for join type: Inner, Right, RightSemi, RightAnti
+fn handle_hash_join(
Review Comment:
I think it would be great eventually to get this kind of operator specific
logic into the operators (e.g. some method in HashJoinExec). Definitely not in
this PR, but having assumptions about the operator separate from its
implementation gives us a larger chance of introducing inconsistencies I think
##########
datafusion/sqllogictest/test_files/joins.slt:
##########
@@ -3160,10 +3160,10 @@ explain SELECT t1_id, t1_name, t1_int FROM
right_semi_anti_join_table_t2 t2 RIGH
----
physical_plan
01)SortPreservingMergeExec: [t1_id@0 ASC NULLS LAST]
-02)--SortExec: expr=[t1_id@0 ASC NULLS LAST], preserve_partitioning=[true]
-03)----CoalesceBatchesExec: target_batch_size=2
-04)------HashJoinExec: mode=CollectLeft, join_type=RightSemi, on=[(t2_id@0,
t1_id@0)], filter=t2_name@0 != t1_name@1
-05)--------MemoryExec: partitions=1, partition_sizes=[1]
+02)--CoalesceBatchesExec: target_batch_size=2
Review Comment:
These plans imply that the sort has been pushed into the second (probe)
input which makes sense I think :
https://docs.rs/datafusion/latest/datafusion/physical_plan/joins/struct.HashJoinExec.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]