Lino Rosa created CALCITE-7138:
----------------------------------

             Summary: RelToSqlConverterTest drops subquery boundary and 
inflates COUNT results
                 Key: CALCITE-7138
                 URL: https://issues.apache.org/jira/browse/CALCITE-7138
             Project: Calcite
          Issue Type: Bug
            Reporter: Lino Rosa


Take this input query:
{code:java}
SELECT
    COUNT(CASE WHEN has_salesman = 1 AND has_high_salary = 1 THEN 1 END) AS 
depts_with_salesman_and_high_salary,
    COUNT(CASE WHEN has_salesman = 1 THEN 1 END) AS depts_with_salesman,
    COUNT(CASE WHEN has_high_salary = 1 THEN 1 END) AS depts_with_high_salary
FROM (
    SELECT
        MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END) AS 
has_salesman,
        MAX(CASE WHEN "salary" > 1250 THEN 1 ELSE 0 END) AS has_high_salary
    FROM "foodmart"."employee"
    GROUP BY "department_id"
) AS dept_flags;{code}
After going through `SqlToRelConverter`, the resulting SQL will have the GROUP 
BY from the subquery inlined into a single query:
{code:java}
SELECT 
    COUNT(
        CASE 
            WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END) 
= 1
             AND MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000 
THEN 1 ELSE 0 END) = 1
            THEN 1 
            ELSE NULL 
        END
    ) AS "DEPTS_WITH_SALESMAN_AND_HIGH_SALARY",
    COUNT(
        CASE 
            WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END) 
= 1 
            THEN 1 
            ELSE NULL 
        END
    ) AS "DEPTS_WITH_SALESMAN",
    COUNT(
        CASE 
            WHEN MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000 
THEN 1 ELSE 0 END) = 1 
            THEN 1 
            ELSE NULL 
        END
    ) AS "DEPTS_WITH_HIGH_SALARY"
FROM "foodmart"."employee"
GROUP BY "department_id";
{code}
This is not equivalent to the first query. 
The first query counts departments, while the second—due to the GROUP BY 
department_id and how COUNT(...) is used—counts rows (employees) inside each 
qualifying department.

For example, for this dataset:
||department_id||position_title||salary||
|10|SALESMAN|1000|
|10|CLERK|3000|
|20|MANAGER|2000|
|30|SALESMAN|1100|

 

The first query results in:
||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
|1|2|2|

 

While the second query returns {*}one row per department{*}, and the numbers 
are row counts, not department counts:
||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
|2|2|2|
|0|0|1|
|0|1|0|

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to