[jira] [Commented] (CALCITE-7138) RelToSqlConverterTest drops subquery boundary and inflates COUNT results

Lino Rosa (Jira) Mon, 18 Aug 2025 12:15:06 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014715#comment-18014715
 ]


Lino Rosa commented on CALCITE-7138:
------------------------------------

Main suspect here is {{{}SqlImplementor#needNewSubQuery{}}}. We want it to 
return {{true}} on the upper Aggregate of this query, but it returns 
{{{}false{}}}.

One way it would return {{true}} is on this block:
{code:java}
if (rel instanceof Aggregate
    && (clauses.contains(Clause.ORDER_BY)
    || clauses.contains(Clause.FETCH)
    || clauses.contains(Clause.OFFSET))) {
  return true;
}
{code}
However at that point clauses only contains these:
{noformat}
0 = {SqlImplementor$Clause@7902} "FROM"
1 = {SqlImplementor$Clause@9293} "SELECT"
2 = {SqlImplementor$Clause@9340} "GROUP_BY"
3 = {SqlImplementor$Clause@9293} "SELECT"
{noformat}
----

Additionally, further down the same method there's this other block below.
{code:java}
if (clauses.contains(Clause.GROUP_BY)) {
  // Avoid losing the distinct attribute of inner aggregate.
  return !hasNestedAgg || Aggregate.isNotGrandTotal(agg);
}
{code}
In this case this still returns {{{}false{}}}.
I suspect this is the place I should change. Maybe jus having a GROUP BY should 
be sufficient to return {{true}} ({_}in which case I delete the block below and 
add {{clauses.contains(Clause.GROUP_By)}} to the block above?{_})


> RelToSqlConverterTest drops subquery boundary and inflates COUNT results
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-7138
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7138
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Lino Rosa
>            Priority: Major
>
> Take this input query:
> {code:java}
> SELECT
>     COUNT(CASE WHEN has_salesman = 1 AND has_high_salary = 1 THEN 1 END) AS 
> depts_with_salesman_and_high_salary,
>     COUNT(CASE WHEN has_salesman = 1 THEN 1 END) AS depts_with_salesman,
>     COUNT(CASE WHEN has_high_salary = 1 THEN 1 END) AS depts_with_high_salary
> FROM (
>     SELECT
>         MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END) AS 
> has_salesman,
>         MAX(CASE WHEN "salary" > 1250 THEN 1 ELSE 0 END) AS has_high_salary
>     FROM "foodmart"."employee"
>     GROUP BY "department_id"
> ) AS dept_flags;{code}
> After going through `SqlToRelConverter`, the resulting SQL will have the 
> GROUP BY from the subquery inlined into a single query:
> {code:java}
> SELECT 
>     COUNT(
>         CASE 
>             WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 
> END) = 1
>              AND MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000 
> THEN 1 ELSE 0 END) = 1
>             THEN 1 
>             ELSE NULL 
>         END
>     ) AS "DEPTS_WITH_SALESMAN_AND_HIGH_SALARY",
>     COUNT(
>         CASE 
>             WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 
> END) = 1 
>             THEN 1 
>             ELSE NULL 
>         END
>     ) AS "DEPTS_WITH_SALESMAN",
>     COUNT(
>         CASE 
>             WHEN MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000 
> THEN 1 ELSE 0 END) = 1 
>             THEN 1 
>             ELSE NULL 
>         END
>     ) AS "DEPTS_WITH_HIGH_SALARY"
> FROM "foodmart"."employee"
> GROUP BY "department_id";
> {code}
> This is not equivalent to the first query. 
> The first query counts departments, while the second—due to the GROUP BY 
> department_id and how COUNT(...) is used—counts rows (employees) inside each 
> qualifying department.
> For example, for this dataset:
> ||department_id||position_title||salary||
> |10|SALESMAN|1000|
> |10|CLERK|3000|
> |20|MANAGER|2000|
> |30|SALESMAN|1100|
>  
> The first query results in:
> ||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
> |1|2|2|
>  
> While the second query returns {*}one row per department{*}, and the numbers 
> are row counts, not department counts:
> ||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
> |2|2|2|
> |0|0|1|
> |0|1|0|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7138) RelToSqlConverterTest drops subquery boundary and inflates COUNT results

Reply via email to