Dandandan commented on code in PR #19635:
URL: https://github.com/apache/datafusion/pull/19635#discussion_r2675462938


##########
datafusion/physical-optimizer/src/join_selection.rs:
##########
@@ -232,19 +237,33 @@ pub(crate) fn partitioned_hash_join(
 ) -> Result<Arc<dyn ExecutionPlan>> {
     let left = hash_join.left();
     let right = hash_join.right();
-    if hash_join.join_type().supports_swap() && 
should_swap_join_order(&**left, &**right)?
+    // Don't swap null-aware anti joins as they have specific side requirements
+    if hash_join.join_type().supports_swap()
+        && !hash_join.null_aware
+        && should_swap_join_order(&**left, &**right)?
     {
         hash_join.swap_inputs(PartitionMode::Partitioned)
     } else {
+        // Null-aware anti joins must use CollectLeft mode because they track 
probe-side state
+        // (probe_side_non_empty, probe_side_has_null) per-partition, but need 
global knowledge
+        // for correct null handling. With partitioning, a partition might not 
see probe rows
+        // even if the probe side is globally non-empty, leading to incorrect 
NULL row handling.
+        let partition_mode = if hash_join.null_aware {

Review Comment:
   Can we avoid `CollectLeft` as fallback if the keys are not nullable or is 
this done already?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to