alamb commented on code in PR #11203:
URL: https://github.com/apache/datafusion/pull/11203#discussion_r1662588836


##########
datafusion/optimizer/src/push_down_filter.rs:
##########
@@ -441,11 +442,11 @@ fn push_down_all_join(
 
     // Extract from OR clause, generate new predicates for both side of join 
if possible.
     // We only track the unpushable predicates above.
-    if left_preserved {
+    if on_left_preserved {
         left_push.extend(extract_or_clauses_for_join(&keep_predicates, 
left_schema));
         left_push.extend(extract_or_clauses_for_join(&join_conditions, 
left_schema));
     }
-    if right_preserved {
+    if on_right_preserved {

Review Comment:
   Thank you for the clarifications
   
   
   > In current version of the code, nothing from keep_predicates and 
join_conditions can be inserted to the left_push since on_left_preserved flag 
is false
   
   I agree, but I think this is required for correctness. At least for 
`join_conditions` it is important not to push them down to the preserved side 
   
   For example, in this query  I am pretty sure it is not correct to push the 
predicate on `A.a` below the join (the intuition is that all rows from `A` have 
to have at least one row out of the join). I believe this is what this PR 
fixes. 
   
   ```SQL
   SELECT .. FROM A LEFT JOIN B ON (A.a > 5)
   ```
   
   I think *is* correct to push predicates on B below (as any non matching rows 
would have been filtered in the join anyways)
   
   ```SQL
   SELECT .. FROM A LEFT JOIN B ON (B.b > 5)
   ```
   
   > We could have inserted additional conditions to the left_push. In this 
sense, current code works correctly, however behaves suboptimal in some cases.
   
   Can you help me write an example? I can add it to this PR and then we can 
handle improving the case in a future PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to