crystalxyz commented on code in PR #17518:
URL: https://github.com/apache/datafusion/pull/17518#discussion_r2400014974


##########
datafusion/common/src/join_type.rs:
##########
@@ -74,6 +74,12 @@ pub enum JoinType {
     RightMark,
 }
 
+const LEFT_PRESERVING: &[JoinType] =
+    &[JoinType::Left, JoinType::Full, JoinType::LeftMark];

Review Comment:
   The left join handling might still be wrong here. We will return true for 
`preserves_left` and push down the right side.
   
   This makes me think if it we should tie preservation to the correctness of 
filter pushdown, as there are cases that preservation doesn't guarantee the 
safety to push down filters. Would it be cleaner if we just hardcode rules for 
filter pushdown, and in that way we can also handle LeftSemi/LeftAnti nicely?



##########
datafusion/physical-plan/src/joins/hash_join/exec.rs:
##########
@@ -84,6 +86,29 @@ use parking_lot::Mutex;
 const HASH_JOIN_SEED: RandomState =
     RandomState::with_seeds('J' as u64, 'O' as u64, 'I' as u64, 'N' as u64);
 
+fn dynamic_filter_pushdown_side(join_type: JoinType) -> JoinSide {
+    use JoinSide::*;
+
+    let preserves_left = join_type.preserves_left();
+    let preserves_right = join_type.preserves_right();
+
+    match (preserves_left, preserves_right) {
+        (true, true) => None,
+        (true, false) => Right,
+        (false, true) => Left,
+        (false, false) => match join_type {
+            // Semi/anti joins do not preserve the build-side rows, but we 
still
+            // want dynamic filters that reference those rows to run there.  By
+            // keeping them on the non-preserving side we avoid pushing the
+            // filter to the opposite input and incorrectly filtering the
+            // non-preserved (outer) rows instead.
+            JoinType::LeftSemi | JoinType::LeftAnti => Left,
+            JoinType::RightSemi | JoinType::RightAnti => Right,
+            _ => Right,

Review Comment:
   The default pattern matching is not ideal because we rely on the implication 
that only Semi or Anti joins will fall under this `(false, false)` branch. We 
could reorganize the code to avoid it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to