Dmitry Sysolyatin created CALCITE-7127: ------------------------------------------
Summary: Anti-join with WHERE NOT EXISTS syntax has corrupted condition Key: CALCITE-7127 URL: https://issues.apache.org/jira/browse/CALCITE-7127 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.40.0 Reporter: Dmitry Sysolyatin Assignee: Dmitry Sysolyatin Fix For: 1.41.0 This issue is similar to [CALCITE-6804]; however, that ticket only partially resolved the underlying problem. Consider the following query: {code:java} SELECT e3."product_id", e3."product_name" FROM ( SELECT 1 AS "additional_column", e1."product_id", e1."product_name" FROM "foodmart"."product" e1 LEFT JOIN "foodmart"."product" e2 ON e1."product_id" = e2."product_id" ) AS e3 WHERE e3."product_name" IS NOT NULL AND NOT EXISTS ( SELECT 1 FROM "foodmart"."employee" e4 WHERE e4."employee_id" = e3."additional_column" ) {code} Calcite incorrectly converts it into the following SQL: {code:java} SELECT product_id, product_name FROM ( SELECT 1 AS additional_column, product.product_id, product.product_name FROM foodmart.product LEFT JOIN foodmart.product AS product0 ON product.product_id = product0.product_id ) AS t WHERE t.product_name IS NOT NULL AND NOT EXISTS ( SELECT * FROM foodmart.employee WHERE employee_id = product.product_class_id ); {code} The problem is in the NOT EXISTS subquery's WHERE clause. In the original query, the correlation condition is: {code:java} WHERE e4."employee_id" = e3."additional_column"{code} In the generated query, this becomes: {code:java} WHERE employee_id = product.product_class_id {code} Calcite incorrectly generates a reference to product.product_class_id from the base table instead of t.additional_column from the derived table. The generated logical plan is correct: {code:java} LogicalProject(product_id=[$1], product_name=[$2]) LogicalFilter(condition=[AND(IS NOT NULL($2), NOT(EXISTS({ LogicalFilter(condition=[=($0, $cor0.additional_column)]) JdbcTableScan(table=[[foodmart, employee]]) })))], variablesSet=[[$cor0]]) LogicalProject(additional_column=[1], product_id=[$1], product_name=[$3]) LogicalJoin(condition=[=($1, $16)], joinType=[left]) JdbcTableScan(table=[[foodmart, product]]) JdbcTableScan(table=[[foodmart, product]]) {code} The root cause appears to be that the `SqlImplementor.Result` for a `Project` exposes the aliases of the relations used within that `Project`. Standard SQL syntax does not allow an outer query to reference these internal relations. The Project operation should encapsulate the relations it is built upon. In this case, the `Filter` should not have access to the relations inside the `Project` it consumes. However, the RelToSqlConverter seems to permit this, by transfering aliases from `Join` to `Project` causing it to resolve the correlated variable incorrectly. This appears to be where the bug originates. So solution would be to removing workaround which was introduced in CALCITE-6804 And replace `return builder.result();` to `return result(builder.select, builder.clauses, e, null);` in https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L562 -- This message was sent by Atlassian Jira (v8.20.10#820010)