Silun Dong created CALCITE-7482:
-----------------------------------

             Summary: Wrong variablesSet used when rewriting subquery in JOIN 
ON clause
                 Key: CALCITE-7482
                 URL: https://issues.apache.org/jira/browse/CALCITE-7482
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.41.0
            Reporter: Silun Dong
            Assignee: Silun Dong
             Fix For: 1.42.0


 
{code:java}
// query
WITH t1(id, sal) as (VALUES (1, 10), (2, 20), (3, 30)), t2(id, sal) as (VALUES 
(2, 20), (2, 200), (3, 30)), t3(id, sal) as (VALUES (3, 30), (4, 40)), t4(id, 
sal) as (VALUES (2, 200), (5, 50))
select * from 
t1, 
lateral(
  select t2.id, t2.sal from t2 join t3 on exists(
    select t4.id from t4 where t4.id=t1.id and t4.sal=t2.sal
  )
); 

// initial plan
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
    LogicalProject(ID=[$0], SAL=[$1])
      LogicalJoin(condition=[EXISTS({
LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, $cor1.SAL))])
  LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
})], joinType=[inner], variablesSet=[[$cor1]])          <- variablesSet of Join
        LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
        LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])

// after removing subquery
LogicalProject(ID=[$0], SAL=[$1], ID0=[$2], SAL0=[$3])
  LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
    LogicalValues(tuples=[[{ 1, 10 }, { 2, 20 }, { 3, 30 }]])
    LogicalProject(ID=[$0], SAL=[$1])
      LogicalProject(EXPR$0=[$0], EXPR$1=[$1], EXPR$00=[$3], EXPR$10=[$4])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor0, 
$cor1]])         <- here
            LogicalValues(tuples=[[{ 2, 20 }, { 2, 200 }, { 3, 30 }]])
            LogicalAggregate(group=[{0}])
              LogicalProject(i=[true])
                LogicalFilter(condition=[AND(=($0, $cor0.ID), =($1, 
$cor1.SAL))])
                  LogicalValues(tuples=[[{ 2, 200 }, { 5, 50 }]])
          LogicalValues(tuples=[[{ 3, 30 }, { 4, 40 }]])

// final output
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
|  2 |  20 |   2 |   20 |
|  2 |  20 |   2 |   20 |
|  2 |  20 |   2 |  200 |
|  2 |  20 |   2 |  200 |
|  2 |  20 |   3 |   30 |
|  2 |  20 |   3 |   30 |
+----+-----+-----+------+

// correct output (verified in PG)
+----+-----+-----+------+
| ID | SAL | ID0 | SAL0 |
+----+-----+-----+------+
|  2 |  20 |   2 |  200 |
|  2 |  20 |   2 |  200 |
+----+-----+-----+------+{code}
The subquery should have been rewritten as a {{Correlate}} with 
{{{}CorrelationId=$cor1{}}}, but it was rewritten as a {{Join}} instead; its 
{{variablesSet}} contains two {{{}CorrelationId{}}}, which is clearly wrong.

The root cause is that when rewriting a subquery within the ON clause, the 
program detected the {{CorrelationId}} used in the subquery but did not filter 
out the {{CorrelationId}} that did not belong only to the current scope.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to