AlanConfluent commented on PR #24992: URL: https://github.com/apache/flink/pull/24992#issuecomment-2287466371
> I have't spend too much thought on this but some pointers to verify that the rewrite is actually correct: https://stackoverflow.com/questions/55893354/predicate-pushdown-vs-on-clause We should definitely add more tests also for the ON clause. Thanks for that link. I read through this quite a bit and here are my understandings: - ON clauses are evaluated "before the join", so if for example, there's a LEFT JOIN and the right row doesn't match the ON clause, the left row can still join with null. These clauses show up at the join operator or can be pushed down to a side if possible. - WHERE clauses are evaluated "after the join", and are safe to pull up right after a join operator. The main thing which we do not want is to take all ON clauses and pull them up after the join because you're turning all join types into INNER JOINS. For this reason, I put in a check that ensures that this logic only runs for INNER JOINS only, and will otherwise throw a TableException if an `AsyncScalarFunction` call ends up in the ON clause without being an INNER JOIN. This behavior is consistent with python and explains the behavior. I added more test cases for the combinations of: - inner join - right join - left join And for function call types: - Single param (will try to push down if in ON) - Params from both tables (will keep as join condition if in ON) And for call location: - WHERE clause - ON clause -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org