[ 
https://issues.apache.org/jira/browse/CALCITE-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937596#comment-17937596
 ] 

Zhen Chen commented on CALCITE-6904:
------------------------------------

There are some issues involved here:
 # Using IS_NOT_DISTINCT_FROM as HashJoin condition, it can not get right 
result.

 # IS_NOT_DISTINCT_FROM is directly used as the condition of NestedLoopJoin, we 
will also meet an error. Because we have not implemented 
IsNotDistinctFromImplementor. Here I commented out the generation of HashJoin 
to force it to use NestedLoopJoin. Sorry, I didn't find an example that can 
directly generate a NestedLoopJoin case.

Plan is:
EnumerableProject(commission=[$0])
  EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
joinType=[inner])
    EnumerableProject(commission=[$4])
      EnumerableTableScan(table=[[hr, emps]])
    EnumerableProject(commission=[$4])
      EnumerableTableScan(table=[[hr, emps]])
​
​
Error while executing SQL "select "t1"."commission" from "hr"."emps" as "t1"
join
"hr"."emps" as "t2"
on "t1"."commission" is not distinct from "t2"."commission"": Unable to 
implement EnumerableCalc(expr#0..1=[\{inputs}], commission=[$t0]): rowcount = 
2500.0, cumulative cost = \{27900.0 rows, 8902.0 cpu, 0.0 io}, id = 142
  EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
joinType=[inner]): rowcount = 2500.0, cumulative cost = \{25400.0 rows, 1402.0 
cpu, 0.0 io}, id = 138
    EnumerableCalc(expr#0..4=[\{inputs}], commission=[$t4]): rowcount = 100.0, 
cumulative cost = \{200.0 rows, 701.0 cpu, 0.0 io}, id = 144
      EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative 
cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 119
    EnumerableCalc(expr#0..4=[\{inputs}], commission=[$t4]): rowcount = 100.0, 
cumulative cost = \{200.0 rows, 701.0 cpu, 0.0 io}, id = 146
      EnumerableTableScan(table=[[hr, emps]]): rowcount = 100.0, cumulative 
cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 119
​ # {{t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)}}will be folded 
into {{t1.key IS NOT DISTINCT FROM t2.key}} (using 
collapseExpandedIsNotDistinctFromExpr). This makes it impossible to execute the 
original OR expression.

 # Currently, calcite recommends using IS_NOT_DISTINCT_FROM, which will be 
treated as an EQUAL condition, but the difference from the EQUAL condition is 
the handling of NULL. When encountering a join condition of 
IS_NOT_DISTINCT_FROM, HashJoin will be used, but the null information saved by 
{{filterNulls}} will be omitted during the conversion process.

Some suggestions:
 # Disable IS_NOT_DISTINCT_FROM fold. These can use OR expression to execute 
correctly.

 # Implemented IsNotDistinctFromImplementor and IS_NOT_DISTINCT_FROM is not 
considered as an EQUAL condition, let IS_NOT_DISTINCT_FROM execute correctly.

 # Add NULL_AWARE parameter to make HashJoin aware of null handling behavior.

The reason why this problem is found is that there is no case in jdbctest about 
using IS_NOT_DISTINCT_FROM as a join condition.I have also read some related 
PRs, but I may still not understand the full picture of calcite execution, and 
I may need some advice from you.

> IS_NOT_DISTINCT_FROM is converted error in EnumerableJoinRule
> -------------------------------------------------------------
>
>                 Key: CALCITE-6904
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6904
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Major
>         Attachments: image-2025-03-21-13-06-47-354.png
>
>
> The convert() method of EnumerableJoinRule incorrectly converts the condition 
> of IS_NOT_DISTINCT_FROM , which is considered an equivalence condition in 
> calcite and carries a `filterNulls` to save whether to perform equivalence 
> processing on NULL. However, when converting to EnumerableJoin, the 
> filterNulls are ignored, resulting in `IS_NOT_DISTINCT_FROM ($0, $1)` being 
> converted to `=($0, $1)`.
>  
> mysql case:
> {code:java}
> CREATE TABLE emps (
>     commission int
> );
> INSERT INTO emps VALUES (1);
> INSERT INTO emps VALUES (2);
> INSERT INTO emps VALUES (3);
> INSERT INTO emps VALUES (NULL);
> SELECT t1.commission FROM emps as t1
> join
> emps as t2
> on t1.commission <=> t2.commission;
> mysql result:
> commission
> 1
> 2
> 3
> NULL {code}
> calcite jdbc case:
> {code:java}
>   @Test void testIsNotDistinctFrom1() {
>     final String sql = ""
>         + "select \"t1\".\"commission\" from \"hr\".\"emps\" as \"t1\"\n"
>         + "join\n"
>         + "\"hr\".\"emps\" as \"t2\"\n"
>         + "on \"t1\".\"commission\" is not distinct from 
> \"t2\".\"commission\"";
>     CalciteAssert.hr()
>         .query(sql)
>         .explainContains("")
>         .returnsUnordered("commission=500",
>             "commission=null");
>   } {code}
> debug result:
> !image-2025-03-21-13-06-47-354.png|width=519,height=649!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to