[jira] [Commented] (CALCITE-7116) Optimize queries with GROUPING SETS by converting them into equivalent UNION ALL of GROUP BY operations

Zhen Chen (Jira) Mon, 04 Aug 2025 08:25:07 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011911#comment-18011911
 ]


Zhen Chen commented on CALCITE-7116:
------------------------------------

[~julianhyde] The GroupId function has been optimized out during SQL-to-Rel 
conversion, with no configuration parameter available to toggle its state 
(enforced as always enabled). Given that constructing execution plans 
preserving GroupId functionality is exceptionally challenging, the GroupId 
function becomes effectively non-existent during RBO. Should we therefore 
consider it unnecessary to handle in our optimization rules?

> Optimize queries with GROUPING SETS by converting them into equivalent UNION 
> ALL of GROUP BY operations
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7116
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7116
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.41.0
>
>
> Currently, GROUPING SETS operations may not be optimally executed in some 
> cases. This ticket proposes a rule to transform GROUPING SETS into a series 
> of UNION ALL operations, each with its own GROUP BY clause. 
> Original query:
> {code:java}
> SELECT a, b, c FROM t GROUP BY GROUPING SETS ((a,b), (a,c))
> {code}
> Transformed to:
> {code:java}
> SELECT a, b, NULL AS c FROM t GROUP BY a, b
> UNION ALL
> SELECT a, NULL AS b, c FROM t GROUP BY a, c
> {code}
> Supplementary description:
> SQL is
> {code:sql}
> SELECT deptno,
>     job,
>     sal,
>     SUM(comm),
>     GROUPING(deptno) AS deptno_flag,
>     GROUPING(job) AS job_flag,
>     GROUPING(sal) AS sal_flag
> FROM emp
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal))
> {code}
> Original Plan is
> {code:sql}
> LogicalAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]], 
> EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], 
> SAL_FLAG=[GROUPING($2)])
>   LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Converted Plan should be
> {code:sql}
> LogicalUnion(all=[true])
>   LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[null:INTEGER], EXPR$3=[$2], 
> $f4=[0:BIGINT], $f5=[0:BIGINT], $f6=[1:BIGINT])
>     LogicalAggregate(group=[{0, 1}], EXPR$3=[SUM($3)])
>       LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalProject(DEPTNO=[$0], $f1=[null:VARCHAR(10)], SAL=[$1], EXPR$3=[$2], 
> $f4=[0:BIGINT], $f5=[1:BIGINT], $f6=[0:BIGINT])
>     LogicalAggregate(group=[{0, 2}], EXPR$3=[SUM($3)])
>       LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> I think we can only support the GROUPING function. GROUPING_ID has been 
> marked as deprecated in Calcite. The GROUP_ID function is used to mark 
> duplicate records. It is difficult to implement in this scenario and can be 
> temporarily disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7116) Optimize queries with GROUPING SETS by converting them into equivalent UNION ALL of GROUP BY operations

Reply via email to