[ https://issues.apache.org/jira/browse/CALCITE-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937596#comment-17937596 ]
Zhen Chen commented on CALCITE-6904: ------------------------------------ There are some issues involved here: # Using IS_NOT_DISTINCT_FROM as HashJoin condition, it can not get right result. # IS_NOT_DISTINCT_FROM is directly used as the condition of NestedLoopJoin, we will also meet an error. Because we have not implemented IsNotDistinctFromImplementor. Here I commented out the generation of HashJoin to force it to use NestedLoopJoin. Sorry, I didn't find an example that can directly generate a NestedLoopJoin case. Plan is: EnumerableProject(commission=[$0]) EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[inner]) EnumerableProject(commission=[$4]) EnumerableTableScan(table=[[hr, emps]]) EnumerableProject(commission=[$4]) EnumerableTableScan(table=[[hr, emps]]) Error while executing SQL "select "t1"."commission" from "hr"."emps" as "t1" join "hr"."emps" as "t2" on "t1"."commission" is not distinct from "t2"."commission"": Unable to implement EnumerableCalc(expr#0..1=[\{inputs}], commission=[$t0]): rowcount = 2500.0, cumulative cost = \{27900.0 rows, 8902.0 cpu, 0.0 io}, id = 142 EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[inner]): rowcount = 2500.0, cumulative cost = \{25400.0 rows, 1402.0 cpu, 0.0 io}, id = 138 EnumerableCalc(expr#0..4=[\{inputs}], commission=[$t4]): rowcount = 100.0, cumulative cost = \{200.0 rows, 701.0 cpu, 0.0 io}, id = 144 EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 119 EnumerableCalc(expr#0..4=[\{inputs}], commission=[$t4]): rowcount = 100.0, cumulative cost = \{200.0 rows, 701.0 cpu, 0.0 io}, id = 146 EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 119 # {{t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)}}will be folded into {{t1.key IS NOT DISTINCT FROM t2.key}} (using collapseExpandedIsNotDistinctFromExpr). This makes it impossible to execute the original OR expression. # Currently, calcite recommends using IS_NOT_DISTINCT_FROM, which will be treated as an EQUAL condition, but the difference from the EQUAL condition is the handling of NULL. When encountering a join condition of IS_NOT_DISTINCT_FROM, HashJoin will be used, but the null information saved by {{filterNulls}} will be omitted during the conversion process. Some suggestions: # Disable IS_NOT_DISTINCT_FROM fold. These can use OR expression to execute correctly. # Implemented IsNotDistinctFromImplementor and IS_NOT_DISTINCT_FROM is not considered as an EQUAL condition, let IS_NOT_DISTINCT_FROM execute correctly. # Add NULL_AWARE parameter to make HashJoin aware of null handling behavior. The reason why this problem is found is that there is no case in jdbctest about using IS_NOT_DISTINCT_FROM as a join condition.I have also read some related PRs, but I may still not understand the full picture of calcite execution, and I may need some advice from you. > IS_NOT_DISTINCT_FROM is converted error in EnumerableJoinRule > ------------------------------------------------------------- > > Key: CALCITE-6904 > URL: https://issues.apache.org/jira/browse/CALCITE-6904 > Project: Calcite > Issue Type: Bug > Reporter: Zhen Chen > Assignee: Zhen Chen > Priority: Major > Attachments: image-2025-03-21-13-06-47-354.png > > > The convert() method of EnumerableJoinRule incorrectly converts the condition > of IS_NOT_DISTINCT_FROM , which is considered an equivalence condition in > calcite and carries a `filterNulls` to save whether to perform equivalence > processing on NULL. However, when converting to EnumerableJoin, the > filterNulls are ignored, resulting in `IS_NOT_DISTINCT_FROM ($0, $1)` being > converted to `=($0, $1)`. > > mysql case: > {code:java} > CREATE TABLE emps ( > commission int > ); > INSERT INTO emps VALUES (1); > INSERT INTO emps VALUES (2); > INSERT INTO emps VALUES (3); > INSERT INTO emps VALUES (NULL); > SELECT t1.commission FROM emps as t1 > join > emps as t2 > on t1.commission <=> t2.commission; > mysql result: > commission > 1 > 2 > 3 > NULL {code} > calcite jdbc case: > {code:java} > @Test void testIsNotDistinctFrom1() { > final String sql = "" > + "select \"t1\".\"commission\" from \"hr\".\"emps\" as \"t1\"\n" > + "join\n" > + "\"hr\".\"emps\" as \"t2\"\n" > + "on \"t1\".\"commission\" is not distinct from > \"t2\".\"commission\""; > CalciteAssert.hr() > .query(sql) > .explainContains("") > .returnsUnordered("commission=500", > "commission=null"); > } {code} > debug result: > !image-2025-03-21-13-06-47-354.png|width=519,height=649! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)