[jira] [Updated] (CALCITE-7116) Optimize queries with GROUPING SETS by converting them into equivalent UNION ALL of GROUP BY operations.

Zhen Chen (Jira) Sun, 03 Aug 2025 05:49:04 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhen Chen updated CALCITE-7116:
-------------------------------
    Description: 
Currently, GROUPING SETS operations may not be optimally executed in some 
cases. This ticket proposes a rule to transform GROUPING SETS into a series of 
UNION ALL operations, each with its own GROUP BY clause. 

Original query:
{code:java}
SELECT a, b, c FROM t GROUP BY GROUPING SETS ((a,b), (a,c))
{code}

Transformed to:
{code:java}
SELECT a, b, NULL AS c FROM t GROUP BY a, b
UNION ALL
SELECT a, NULL AS b, c FROM t GROUP BY a, c
{code}

Supplementary description:

SQL is

{code:sql}
SELECT deptno,
    job,
    sal,
    SUM(comm),
    GROUPING(deptno) AS deptno_flag,
    GROUPING(job) AS job_flag,
    GROUPING(sal) AS sal_flag
FROM emp
GROUP BY GROUPING SETS ((deptno, job), (deptno, sal))
{code}

Original Plan is

{code:sql}
LogicalAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]], 
EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], 
SAL_FLAG=[GROUPING($2)])
  LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{code}

Converted Plan should be

{code:sql}
LogicalUnion(all=[true])
  LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[null:INTEGER], EXPR$3=[$2], 
$f4=[0:BIGINT], $f5=[0:BIGINT], $f6=[1:BIGINT])
    LogicalAggregate(group=[{0, 1}], EXPR$3=[SUM($3)])
      LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalProject(DEPTNO=[$0], $f1=[null:VARCHAR(10)], SAL=[$1], EXPR$3=[$2], 
$f4=[0:BIGINT], $f5=[1:BIGINT], $f6=[0:BIGINT])
    LogicalAggregate(group=[{0, 2}], EXPR$3=[SUM($3)])
      LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{code}

I think we can only support the GROUPING function. GROUPING_ID has been marked 
as deprecated in Calcite. The GROUP_ID function is used to mark duplicate 
records. It is difficult to implement in this scenario and can be temporarily 
disabled.


  was:
Currently, GROUPING SETS operations may not be optimally executed in some 
cases. This ticket proposes a rule to transform GROUPING SETS into a series of 
UNION ALL operations, each with its own GROUP BY clause. 

Original query:
{code:java}
SELECT a, b, c FROM t GROUP BY GROUPING SETS ((a,b), (a,c))
{code}

Transformed to:
{code:java}
SELECT a, b, NULL AS c FROM t GROUP BY a, b
UNION ALL
SELECT a, NULL AS b, c FROM t GROUP BY a, c
{code}

Supplementary description:

SQL is

```
SELECT deptno,
    job,
    sal,
    SUM(comm),
    GROUPING(deptno) AS deptno_flag,
    GROUPING(job) AS job_flag,
    GROUPING(sal) AS sal_flag
FROM emp
GROUP BY GROUPING SETS ((deptno, job), (deptno, sal))
```

Original Plan is

``` 
LogicalAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]], 
EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], 
SAL_FLAG=[GROUPING($2)])
  LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
```

Converted Plan should be

```
LogicalUnion(all=[true])
  LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[null:INTEGER], EXPR$3=[$2], 
$f4=[0:BIGINT], $f5=[0:BIGINT], $f6=[1:BIGINT])
    LogicalAggregate(group=[{0, 1}], EXPR$3=[SUM($3)])
      LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalProject(DEPTNO=[$0], $f1=[null:VARCHAR(10)], SAL=[$1], EXPR$3=[$2], 
$f4=[0:BIGINT], $f5=[1:BIGINT], $f6=[0:BIGINT])
    LogicalAggregate(group=[{0, 2}], EXPR$3=[SUM($3)])
      LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
```

I think we can only support the GROUPING function. GROUPING_ID has been marked 
as deprecated in Calcite. The GROUP_ID function is used to mark duplicate 
records. It is difficult to implement in this scenario and can be temporarily 
disabled.



> Optimize queries with GROUPING SETS by converting them into equivalent UNION 
> ALL of GROUP BY operations.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7116
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7116
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Zhen Chen
>            Assignee: Yu Xu
>            Priority: Minor
>             Fix For: 1.41.0
>
>
> Currently, GROUPING SETS operations may not be optimally executed in some 
> cases. This ticket proposes a rule to transform GROUPING SETS into a series 
> of UNION ALL operations, each with its own GROUP BY clause. 
> Original query:
> {code:java}
> SELECT a, b, c FROM t GROUP BY GROUPING SETS ((a,b), (a,c))
> {code}
> Transformed to:
> {code:java}
> SELECT a, b, NULL AS c FROM t GROUP BY a, b
> UNION ALL
> SELECT a, NULL AS b, c FROM t GROUP BY a, c
> {code}
> Supplementary description:
> SQL is
> {code:sql}
> SELECT deptno,
>     job,
>     sal,
>     SUM(comm),
>     GROUPING(deptno) AS deptno_flag,
>     GROUPING(job) AS job_flag,
>     GROUPING(sal) AS sal_flag
> FROM emp
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal))
> {code}
> Original Plan is
> {code:sql}
> LogicalAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]], 
> EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], 
> SAL_FLAG=[GROUPING($2)])
>   LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> Converted Plan should be
> {code:sql}
> LogicalUnion(all=[true])
>   LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[null:INTEGER], EXPR$3=[$2], 
> $f4=[0:BIGINT], $f5=[0:BIGINT], $f6=[1:BIGINT])
>     LogicalAggregate(group=[{0, 1}], EXPR$3=[SUM($3)])
>       LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalProject(DEPTNO=[$0], $f1=[null:VARCHAR(10)], SAL=[$1], EXPR$3=[$2], 
> $f4=[0:BIGINT], $f5=[1:BIGINT], $f6=[0:BIGINT])
>     LogicalAggregate(group=[{0, 2}], EXPR$3=[SUM($3)])
>       LogicalProject(DEPTNO=[$7], JOB=[$2], SAL=[$5], COMM=[$6])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> I think we can only support the GROUPING function. GROUPING_ID has been 
> marked as deprecated in Calcite. The GROUP_ID function is used to mark 
> duplicate records. It is difficult to implement in this scenario and can be 
> temporarily disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-7116) Optimize queries with GROUPING SETS by converting them into equivalent UNION ALL of GROUP BY operations.

Reply via email to