viirya commented on code in PR #19635:
URL: https://github.com/apache/datafusion/pull/19635#discussion_r2676937415


##########
datafusion/physical-optimizer/src/join_selection.rs:
##########
@@ -232,19 +237,33 @@ pub(crate) fn partitioned_hash_join(
 ) -> Result<Arc<dyn ExecutionPlan>> {
     let left = hash_join.left();
     let right = hash_join.right();
-    if hash_join.join_type().supports_swap() && 
should_swap_join_order(&**left, &**right)?
+    // Don't swap null-aware anti joins as they have specific side requirements
+    if hash_join.join_type().supports_swap()
+        && !hash_join.null_aware
+        && should_swap_join_order(&**left, &**right)?
     {
         hash_join.swap_inputs(PartitionMode::Partitioned)
     } else {
+        // Null-aware anti joins must use CollectLeft mode because they track 
probe-side state
+        // (probe_side_non_empty, probe_side_has_null) per-partition, but need 
global knowledge
+        // for correct null handling. With partitioning, a partition might not 
see probe rows
+        // even if the probe side is globally non-empty, leading to incorrect 
NULL row handling.
+        let partition_mode = if hash_join.null_aware {

Review Comment:
   Yes! I added an optimization that detects when join keys are non-nullable 
and avoids using null_aware=true in those cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to