AlanConfluent commented on PR #24992:
URL: https://github.com/apache/flink/pull/24992#issuecomment-2287466371

   > I have't spend too much thought on this but some pointers to verify that 
the rewrite is actually correct: 
https://stackoverflow.com/questions/55893354/predicate-pushdown-vs-on-clause We 
should definitely add more tests also for the ON clause.
   
   Thanks for that link.  I read through this quite a bit and here are my 
understandings:
   - ON clauses are evaluated "before the join", so if for example, there's a 
LEFT JOIN and the right row doesn't match the ON clause, the left row can still 
join with null.  These clauses show up at the join operator or can be pushed 
down to a side if possible.
   - WHERE clauses are evaluated "after the join", and are safe to pull up 
right after a join operator.
   
   The main thing which we do not want is to take all ON clauses and pull them 
up after the join because you're turning all join types into INNER JOINS.  For 
this reason, I put in a check that ensures that this logic only runs for INNER 
JOINS only, and will otherwise throw a TableException if an 
`AsyncScalarFunction` call ends up in the ON clause without being an INNER JOIN.
   
   This behavior is consistent with python and explains the behavior.
   
   I added more test cases for the combinations of:
   - inner join
   - right join
   - left join
   And for function call types:
   - Single param (will try to push down if in ON)
   - Params from both tables (will keep as join condition if in ON)
   And for call location:
   - WHERE clause
   - ON clause


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to