Zhen Chen created CALCITE-7634:
----------------------------------
Summary: JoinExpandOrToUnionRule incorrectly expands OR branches
with non-equi predicates referencing both join inputs
Key: CALCITE-7634
URL: https://issues.apache.org/jira/browse/CALCITE-7634
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.42.0
Reporter: Zhen Chen
Assignee: Zhen Chen
Fix For: 1.43.0
{{JoinExpandOrToUnionRule}} may incorrectly split an OR branch that contains
both an equi-join predicate and a non-equi predicate referencing both join
inputs.
SQL:
{code:java}
select *
from EMP as p1
inner join EMP as p2
on (p1.empno = p2.empno and p1.sal < p2.sal)
or (p1.mgr = p2.mgr and p1.comm < p2.comm)
or p1.deptno = p2.deptno {code}
Before the fix, the rule expands the condition into 3 LogicalJoin branches:
{code:java}
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[$9], ENAME0=[$10],
JOB0=[$11], MGR0=[$12], HIREDATE0=[$13], SAL0=[$14], COMM0=[$15],
DEPTNO0=[$16], SLACKER0=[$17])
LogicalUnion(all=[true])
LogicalJoin(condition=[AND(=($0, $9), <($5, $14))], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalJoin(condition=[AND(=($3, $12), <($6, $15), OR(<>($0, $9), >=($5,
$14)))], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalJoin(condition=[AND(=($7, $16), OR(<>($0, $9), >=($5, $14)),
OR(<>($3, $12), >=($6, $15)))], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
This is unsafe because predicates such as p1.sal < p2.sal and p1.comm < p2.comm
reference columns from both join inputs and can evaluate to UNKNOWN under SQL
three-valued logic. The generated NOT conditions are not equivalent to
excluding previously matched branches in nullable cases.After the fix, the
unsafe branches remain grouped, and only the safe equi-join branch is split out:
{code:java}
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[$9], ENAME0=[$10],
JOB0=[$11], MGR0=[$12], HIREDATE0=[$13], SAL0=[$14], COMM0=[$15],
DEPTNO0=[$16], SLACKER0=[$17])
LogicalUnion(all=[true])
LogicalJoin(condition=[OR(AND(=($0, $9), <($5, $14)), AND(=($3, $12), <($6,
$15)))], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalJoin(condition=[AND(=($7, $16), OR(<>($0, $9), >=($5, $14)),
OR(<>($3, $12), >=($6, $15)))], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
Root cause
`RexInputRefCounter.visitInputRef` increments `leftFieldCount` instead of
`leftInputRefCount`, so predicates referencing the left input may be
miscounted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)