adriangb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3054703987
Note that I added a HashJoinExec implementation to motivate this PR but remove it in [5940cca](https://github.com/apache/datafusion/pull/16732/commits/5940cca7c8ca9620781664425fb45d66aeedd726) because it lacks nuance necessary for a final complete correct implementation (it doesn't take into account the join type, etc.). I asked Claude to analyze the logical filter pushdown on joins and it reported the following: ``` Join Types and Filter Push Down Rules For WHERE clause filters (lr_is_preserved): - Inner joins: Filters can be pushed to both sides - Left joins: Filters can only be pushed to the left side - Right joins: Filters can only be pushed to the right side - Full joins: Filters cannot be pushed to either side - Semi/Anti joins: Filters can be pushed to the preserved side only For ON clause filters (on_lr_is_preserved): - Inner joins: Filters can be pushed to both sides - Left joins: Filters can only be pushed to the right side - Right joins: Filters can only be pushed to the left side - Full joins: Filters cannot be pushed to either side - Semi/Anti joins: Different rules apply based on join variant Filter Restrictions The can_evaluate_as_join_condition function at line 255 shows that filters can only be converted to join conditions if they: Allowed expressions: - Column references - Literals - Placeholders - Scalar variables - Binary expressions - LIKE/SimilarTo predicates - NOT expressions - IS NULL/IS NOT NULL - CASE expressions - Cast expressions - Try cast expressions - Scalar functions Disallowed expressions: - Subqueries (EXISTS, IN subquery, scalar subquery) - Outer reference columns - UNNEST expressions Additionally, for non-inner joins, inferred predicates must strictly filter out NULLs to be pushed down to avoid incorrect results. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org