[ 
https://issues.apache.org/jira/browse/CALCITE-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated CALCITE-7127:
---------------------------------------
    Description: 
This issue is similar to CALCITE-6804; however, that ticket only partially 
resolved the underlying problem.

Consider the following query:
{code:java}
SELECT e3."product_id", e3."product_name"
FROM (
    SELECT 1 AS "additional_column",
           e1."product_id",
           e1."product_name"
    FROM   "foodmart"."product" e1
    LEFT JOIN "foodmart"."product" e2
    ON e1."product_id" = e2."product_id"
) AS e3
WHERE e3."product_name" IS NOT NULL AND NOT EXISTS (
    SELECT 1
    FROM "foodmart"."employee" e4
    WHERE e4."employee_id" = e3."additional_column"
) {code}
Calcite incorrectly converts it into the following SQL:
{code:java}
SELECT product_id, product_name
FROM (
  SELECT
    1 AS additional_column,
    product.product_id,
    product.product_name
  FROM foodmart.product
  LEFT JOIN foodmart.product AS product0
  ON product.product_id = product0.product_id
) AS t
WHERE t.product_name IS NOT NULL AND NOT EXISTS (
    SELECT *
    FROM foodmart.employee
    WHERE employee_id = product.product_class_id
); {code}
The problem is in the NOT EXISTS subquery's WHERE clause.

In the original query, the correlation condition is:
{code:java}
WHERE e4."employee_id" = e3."additional_column"{code}
In the generated query, this becomes: 
{code:java}
WHERE employee_id = product.product_class_id {code}
Calcite incorrectly generates a reference to product.product_class_id from the 
base table instead of t.additional_column from the derived table.

The generated logical plan is correct:
{code:java}
LogicalProject(product_id=[$1], product_name=[$2])
  LogicalFilter(condition=[AND(IS NOT NULL($2), NOT(EXISTS({
LogicalFilter(condition=[=($0, $cor0.additional_column)])
  JdbcTableScan(table=[[foodmart, employee]])
})))], variablesSet=[[$cor0]])
    LogicalProject(additional_column=[1], product_id=[$1], product_name=[$3])
      LogicalJoin(condition=[=($1, $16)], joinType=[left])
        JdbcTableScan(table=[[foodmart, product]])
        JdbcTableScan(table=[[foodmart, product]]) {code}

  was:
This issue is similar to [CALCITE-6804]; however, that ticket only partially 
resolved the underlying problem.

Consider the following query:
{code:java}
SELECT e3."product_id", e3."product_name"
FROM (
    SELECT 1 AS "additional_column",
           e1."product_id",
           e1."product_name"
    FROM   "foodmart"."product" e1
    LEFT JOIN "foodmart"."product" e2
    ON e1."product_id" = e2."product_id"
) AS e3
WHERE e3."product_name" IS NOT NULL AND NOT EXISTS (
    SELECT 1
    FROM "foodmart"."employee" e4
    WHERE e4."employee_id" = e3."additional_column"
) {code}
Calcite incorrectly converts it into the following SQL:
{code:java}
SELECT product_id, product_name
FROM (
  SELECT
    1 AS additional_column,
    product.product_id,
    product.product_name
  FROM foodmart.product
  LEFT JOIN foodmart.product AS product0
  ON product.product_id = product0.product_id
) AS t
WHERE t.product_name IS NOT NULL AND NOT EXISTS (
    SELECT *
    FROM foodmart.employee
    WHERE employee_id = product.product_class_id
); {code}
The problem is in the NOT EXISTS subquery's WHERE clause.

In the original query, the correlation condition is:
{code:java}
WHERE e4."employee_id" = e3."additional_column"{code}
In the generated query, this becomes: 
{code:java}
WHERE employee_id = product.product_class_id {code}
Calcite incorrectly generates a reference to product.product_class_id from the 
base table instead of t.additional_column from the derived table.

The generated logical plan is correct:
{code:java}
LogicalProject(product_id=[$1], product_name=[$2])
  LogicalFilter(condition=[AND(IS NOT NULL($2), NOT(EXISTS({
LogicalFilter(condition=[=($0, $cor0.additional_column)])
  JdbcTableScan(table=[[foodmart, employee]])
})))], variablesSet=[[$cor0]])
    LogicalProject(additional_column=[1], product_id=[$1], product_name=[$3])
      LogicalJoin(condition=[=($1, $16)], joinType=[left])
        JdbcTableScan(table=[[foodmart, product]])
        JdbcTableScan(table=[[foodmart, product]]) {code}
The root cause appears to be that the `SqlImplementor.Result` for a `Project` 
exposes the aliases of the relations used within that `Project`. Standard SQL 
syntax does not allow an outer query to reference these internal relations. 
The Project operation should encapsulate the relations it is built upon. In 
this case, the `Filter` should not have access to the relations inside the 
`Project` it consumes. However, the RelToSqlConverter seems to permit this, by 
transfering aliases from `Join` to `Project` causing it to resolve the 
correlated variable incorrectly. This appears to be where the bug originates. 

So solution would be to removing workaround which was introduced in 
CALCITE-6804 

And replace `return builder.result();` to `return result(builder.select, 
builder.clauses, e, null);` in 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L562


> Anti-join with WHERE NOT EXISTS syntax has corrupted condition
> --------------------------------------------------------------
>
>                 Key: CALCITE-7127
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7127
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.40.0
>            Reporter: Dmitry Sysolyatin
>            Assignee: Dmitry Sysolyatin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.41.0
>
>
> This issue is similar to CALCITE-6804; however, that ticket only partially 
> resolved the underlying problem.
> Consider the following query:
> {code:java}
> SELECT e3."product_id", e3."product_name"
> FROM (
>     SELECT 1 AS "additional_column",
>            e1."product_id",
>            e1."product_name"
>     FROM   "foodmart"."product" e1
>     LEFT JOIN "foodmart"."product" e2
>     ON e1."product_id" = e2."product_id"
> ) AS e3
> WHERE e3."product_name" IS NOT NULL AND NOT EXISTS (
>     SELECT 1
>     FROM "foodmart"."employee" e4
>     WHERE e4."employee_id" = e3."additional_column"
> ) {code}
> Calcite incorrectly converts it into the following SQL:
> {code:java}
> SELECT product_id, product_name
> FROM (
>   SELECT
>     1 AS additional_column,
>     product.product_id,
>     product.product_name
>   FROM foodmart.product
>   LEFT JOIN foodmart.product AS product0
>   ON product.product_id = product0.product_id
> ) AS t
> WHERE t.product_name IS NOT NULL AND NOT EXISTS (
>     SELECT *
>     FROM foodmart.employee
>     WHERE employee_id = product.product_class_id
> ); {code}
> The problem is in the NOT EXISTS subquery's WHERE clause.
> In the original query, the correlation condition is:
> {code:java}
> WHERE e4."employee_id" = e3."additional_column"{code}
> In the generated query, this becomes: 
> {code:java}
> WHERE employee_id = product.product_class_id {code}
> Calcite incorrectly generates a reference to product.product_class_id from 
> the base table instead of t.additional_column from the derived table.
> The generated logical plan is correct:
> {code:java}
> LogicalProject(product_id=[$1], product_name=[$2])
>   LogicalFilter(condition=[AND(IS NOT NULL($2), NOT(EXISTS({
> LogicalFilter(condition=[=($0, $cor0.additional_column)])
>   JdbcTableScan(table=[[foodmart, employee]])
> })))], variablesSet=[[$cor0]])
>     LogicalProject(additional_column=[1], product_id=[$1], product_name=[$3])
>       LogicalJoin(condition=[=($1, $16)], joinType=[left])
>         JdbcTableScan(table=[[foodmart, product]])
>         JdbcTableScan(table=[[foodmart, product]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to