[ 
https://issues.apache.org/jira/browse/CALCITE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947529#comment-17947529
 ] 

suibianwanwan commented on CALCITE-6962:
----------------------------------------

Upon further analysis, I realized the issue was caused by ignoring {{NULL}} 
equality when handling {*}Correlate{*}.

As described in the formula derivation in *Unnesting Arbitrary Queries*: _"Note 
that unless indicated otherwise this operator has IS semantics, i.e., it 
compares NULL values as equal."_

This is also reflected in the pseudo code from the paper *Improving Unnesting 
of Complex Queries:*
{code:java}
fun unnest(join, info, accessing):
  // Split accessing into accessingLeft and accessingRight for input of join
  if accessing(join) is not empty:
    dJoinElimination(join, info, accessing)
    return
  
  // Check if only one side accesses outer columns
  if accessingRight is empty and info.join cannot output unmatched from the 
right:
    unnest(join.left, info, accessingLeft)
    rewriteColumns(join.condition, info)
    return  if accessingLeft is empty and info.join cannot output unmatched 
from the left:
    unnest(join.right, info, accessingRight)
    rewriteColumns(join.condition, info)
    return  // Unnest both sides
  unnestingLeft = new Unnesting(info.info)
  unnestingRight = new Unnesting(info.info)
  unnest(join.left, unnestLeft, accessingLeft)
  unnest(join.right, unnestRight, accessingRight)
  rewriteColumnsForJoin(join.condition, unnestingLeft, unnestingRight)
  
  for c in info.outerRefs:
    add "{unnestLeft.repr[c]}␣is␣not␣distinct␣from␣{unnestRight.repr[c]}" to 
join.condition  merge cclasses and repr from unnestLeft and unnestRight into 
info {code}

> Exists subquery returns incorrect result when or condition involves null 
> column
> -------------------------------------------------------------------------------
>
>                 Key: CALCITE-6962
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6962
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: suibianwanwan
>            Assignee: suibianwanwan
>            Priority: Major
>
> Test in sub-query.iq
> {code:java}
> select *
> from "scott".emp as e
> where exists (
>   select empno
>   from "scott".emp as ee
>   where e.empno = ee.empno or e.comm >= ee.sal
> )
> {code}
> Expected result:
> {code:java}
> +-------+--------+-----------+------+------------+---------+---------+--------+
> | EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO 
> |
> +-------+--------+-----------+------+------------+---------+---------+--------+
> |  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |         |     20 
> |
> |  7499 | ALLEN  | SALESMAN  | 7698 | 1981-02-20 | 1600.00 |  300.00 |     30 
> |
> |  7521 | WARD   | SALESMAN  | 7698 | 1981-02-22 | 1250.00 |  500.00 |     30 
> |
> |  7566 | JONES  | MANAGER   | 7839 | 1981-02-04 | 2975.00 |         |     20 
> |
> |  7654 | MARTIN | SALESMAN  | 7698 | 1981-09-28 | 1250.00 | 1400.00 |     30 
> |
> |  7698 | BLAKE  | MANAGER   | 7839 | 1981-01-05 | 2850.00 |         |     30 
> |
> |  7782 | CLARK  | MANAGER   | 7839 | 1981-06-09 | 2450.00 |         |     10 
> |
> |  7788 | SCOTT  | ANALYST   | 7566 | 1987-04-19 | 3000.00 |         |     20 
> |
> |  7839 | KING   | PRESIDENT |      | 1981-11-17 | 5000.00 |         |     10 
> |
> |  7844 | TURNER | SALESMAN  | 7698 | 1981-09-08 | 1500.00 |    0.00 |     30 
> |
> |  7876 | ADAMS  | CLERK     | 7788 | 1987-05-23 | 1100.00 |         |     20 
> |
> |  7900 | JAMES  | CLERK     | 7698 | 1981-12-03 |  950.00 |         |     30 
> |
> |  7902 | FORD   | ANALYST   | 7566 | 1981-12-03 | 3000.00 |         |     20 
> |
> |  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |         |     10 
> |
> +-------+--------+-----------+------+------------+---------+---------+--------+
> (14 rows)
> {code}
> Actual result:
> {code:java}
> +-------+--------+----------+------+------------+---------+---------+--------+
> | EMPNO | ENAME  | JOB      | MGR  | HIREDATE   | SAL     | COMM    | DEPTNO |
> +-------+--------+----------+------+------------+---------+---------+--------+
> |  7499 | ALLEN  | SALESMAN | 7698 | 1981-02-20 | 1600.00 |  300.00 |     30 |
> |  7521 | WARD   | SALESMAN | 7698 | 1981-02-22 | 1250.00 |  500.00 |     30 |
> |  7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 |     30 |
> |  7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 |    0.00 |     30 |
> +-------+--------+----------+------+------------+---------+---------+--------+
> {code}
> Plan:
> {code:java}
> EnumerableSort(sort0=[$0], sort1=[$5], dir0=[ASC], dir1=[ASC])
>   EnumerableHashJoin(condition=[AND(=($0, $10), =($6, $11))], joinType=[semi])
>     EnumerableTableScan(table=[[scott, EMP]])
>     EnumerableNestedLoopJoin(condition=[OR(=($2, $0), >=($3, $1))], 
> joinType=[inner])
>       EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], SAL=[$t5])
>         EnumerableTableScan(table=[[scott, EMP]])
>       EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
>         EnumerableTableScan(table=[[scott, EMP]])
> {code}
> Rows where {{COMM}} is null were excluded from the result set due to the join 
> condition. This is likely caused by a semantic mismatch between the original 
> {{EXISTS}} clause and its translated semi-join form:
> {code:java}
> EnumerableHashJoin(condition=[AND(=($0, $10), =($6, $11))], 
> joinType=[semi]){code}
> If comm is null, 
> {code:java}
> e.empno = ee.empno or e.comm >= ee.sal{code}
>  evaluates to true, but
> {code:java}
> AND(=($0, $10), =($6, $11)){code}
> evaluates to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to