[
https://issues.apache.org/jira/browse/CALCITE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014715#comment-18014715
]
Lino Rosa commented on CALCITE-7138:
------------------------------------
Main suspect here is {{{}SqlImplementor#needNewSubQuery{}}}. We want it to
return {{true}} on the upper Aggregate of this query, but it returns
{{{}false{}}}.
One way it would return {{true}} is on this block:
{code:java}
if (rel instanceof Aggregate
&& (clauses.contains(Clause.ORDER_BY)
|| clauses.contains(Clause.FETCH)
|| clauses.contains(Clause.OFFSET))) {
return true;
}
{code}
However at that point clauses only contains these:
{noformat}
0 = {SqlImplementor$Clause@7902} "FROM"
1 = {SqlImplementor$Clause@9293} "SELECT"
2 = {SqlImplementor$Clause@9340} "GROUP_BY"
3 = {SqlImplementor$Clause@9293} "SELECT"
{noformat}
----
Additionally, further down the same method there's this other block below.
{code:java}
if (clauses.contains(Clause.GROUP_BY)) {
// Avoid losing the distinct attribute of inner aggregate.
return !hasNestedAgg || Aggregate.isNotGrandTotal(agg);
}
{code}
In this case this still returns {{{}false{}}}.
I suspect this is the place I should change. Maybe jus having a GROUP BY should
be sufficient to return {{true}} ({_}in which case I delete the block below and
add {{clauses.contains(Clause.GROUP_By)}} to the block above?{_})
> RelToSqlConverterTest drops subquery boundary and inflates COUNT results
> ------------------------------------------------------------------------
>
> Key: CALCITE-7138
> URL: https://issues.apache.org/jira/browse/CALCITE-7138
> Project: Calcite
> Issue Type: Bug
> Reporter: Lino Rosa
> Priority: Major
>
> Take this input query:
> {code:java}
> SELECT
> COUNT(CASE WHEN has_salesman = 1 AND has_high_salary = 1 THEN 1 END) AS
> depts_with_salesman_and_high_salary,
> COUNT(CASE WHEN has_salesman = 1 THEN 1 END) AS depts_with_salesman,
> COUNT(CASE WHEN has_high_salary = 1 THEN 1 END) AS depts_with_high_salary
> FROM (
> SELECT
> MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0 END) AS
> has_salesman,
> MAX(CASE WHEN "salary" > 1250 THEN 1 ELSE 0 END) AS has_high_salary
> FROM "foodmart"."employee"
> GROUP BY "department_id"
> ) AS dept_flags;{code}
> After going through `SqlToRelConverter`, the resulting SQL will have the
> GROUP BY from the subquery inlined into a single query:
> {code:java}
> SELECT
> COUNT(
> CASE
> WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0
> END) = 1
> AND MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000
> THEN 1 ELSE 0 END) = 1
> THEN 1
> ELSE NULL
> END
> ) AS "DEPTS_WITH_SALESMAN_AND_HIGH_SALARY",
> COUNT(
> CASE
> WHEN MAX(CASE WHEN "position_title" = 'SALESMAN' THEN 1 ELSE 0
> END) = 1
> THEN 1
> ELSE NULL
> END
> ) AS "DEPTS_WITH_SALESMAN",
> COUNT(
> CASE
> WHEN MAX(CASE WHEN CAST("salary" AS DECIMAL(14, 4)) > 1250.0000
> THEN 1 ELSE 0 END) = 1
> THEN 1
> ELSE NULL
> END
> ) AS "DEPTS_WITH_HIGH_SALARY"
> FROM "foodmart"."employee"
> GROUP BY "department_id";
> {code}
> This is not equivalent to the first query.
> The first query counts departments, while the second—due to the GROUP BY
> department_id and how COUNT(...) is used—counts rows (employees) inside each
> qualifying department.
> For example, for this dataset:
> ||department_id||position_title||salary||
> |10|SALESMAN|1000|
> |10|CLERK|3000|
> |20|MANAGER|2000|
> |30|SALESMAN|1100|
>
> The first query results in:
> ||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
> |1|2|2|
>
> While the second query returns {*}one row per department{*}, and the numbers
> are row counts, not department counts:
> ||depts_with_salesman_and_high_salary||depts_with_salesman||depts_with_high_salary||
> |2|2|2|
> |0|0|1|
> |0|1|0|
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)