[ 
https://issues.apache.org/jira/browse/CALCITE-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011320#comment-18011320
 ] 

Zhen Chen commented on CALCITE-7116:
------------------------------------

[~julianhyde] From what I currently understand, some databases require 
performing aggregation twice to complete grouping sets. For example, 
"GROUP BY GROUPING SETS ((a,b), (a,c))" would first aggregate by a, b, c, and 
groupingId in the initial aggregation, then perform separate aggregations for 
(a,b) and (a,c) respectively. When split into UNION ALL, it only needs to 
perform aggregations for (a,b) and (a,c) separately, reducing intermediate data 
volume. Currently, performance improvements have been observed in Doris and 
Databend. Of course, this is largely related to their operator implementation.

> Optimize queries with GROUPING SETS by converting them into equivalent UNION 
> ALL of GROUP BY operations.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7116
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7116
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Zhen Chen
>            Assignee: Yu Xu
>            Priority: Minor
>             Fix For: 1.41.0
>
>
> Currently, GROUPING SETS operations may not be optimally executed in some 
> cases. This ticket proposes a rule to transform GROUPING SETS into a series 
> of UNION ALL operations, each with its own GROUP BY clause. 
> Original query:
> {code:java}
> SELECT a, b, c FROM t GROUP BY GROUPING SETS ((a,b), (a,c))
> {code}
> Transformed to:
> {code:java}
> SELECT a, b, NULL AS c FROM t GROUP BY a, b
> UNION ALL
> SELECT a, NULL AS b, c FROM t GROUP BY a, c
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to