alamb commented on code in PR #13560: URL: https://github.com/apache/datafusion/pull/13560#discussion_r1870587465
########## datafusion/core/src/physical_optimizer/sort_pushdown.rs: ########## @@ -606,6 +611,102 @@ fn handle_custom_pushdown( } } +// For hash join we only maintain the input order for the right child +// for join type: Inner, Right, RightSemi, RightAnti +fn handle_hash_join( Review Comment: I think it would be great eventually to get this kind of operator specific logic into the operators (e.g. some method in HashJoinExec). Definitely not in this PR, but having assumptions about the operator separate from its implementation gives us a larger chance of introducing inconsistencies I think ########## datafusion/sqllogictest/test_files/joins.slt: ########## @@ -3160,10 +3160,10 @@ explain SELECT t1_id, t1_name, t1_int FROM right_semi_anti_join_table_t2 t2 RIGH ---- physical_plan 01)SortPreservingMergeExec: [t1_id@0 ASC NULLS LAST] -02)--SortExec: expr=[t1_id@0 ASC NULLS LAST], preserve_partitioning=[true] -03)----CoalesceBatchesExec: target_batch_size=2 -04)------HashJoinExec: mode=CollectLeft, join_type=RightSemi, on=[(t2_id@0, t1_id@0)], filter=t2_name@0 != t1_name@1 -05)--------MemoryExec: partitions=1, partition_sizes=[1] +02)--CoalesceBatchesExec: target_batch_size=2 Review Comment: These plans imply that the sort has been pushed into the second (probe) input which makes sense I think : https://docs.rs/datafusion/latest/datafusion/physical_plan/joins/struct.HashJoinExec.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org