[jira] [Updated] (CALCITE-6537) Add syntax to allow non-aggregated rows to be used in GROUPING SETS

Julian Hyde (Jira) Mon, 19 Aug 2024 13:49:03 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Julian Hyde updated CALCITE-6537:
---------------------------------
    Description: 
{{GROUPING SETS}} is a powerful and convenient syntax that allows a query to 
return results at varying levels of aggregation - in order words, to compute 
sub-totals and grand totals - but it does not currently allow the query to 
return unaggregated rows.

This case proposes to allow {{*}} (asterisk, star) as a column name in the 
{{GROUPING SETS}} sub-clause of {{{}GROUP BY{}}}. It would also allow {{*}} in 
{{ROLLUP}} and {{CUBE}} sub-clauses, which are syntactic sugar for {{{}GROUPING 
SETS{}}}.

(We use {{*}} by analogy with {{{}COUNT(*){}}}, which means 'include all 
possible columns, including a hypothetical row identifier, so that each row is 
considered unique', rather than by analogy with {{{}SELECT *{}}}, which means 
'expand to all non-system columns declared in the table'.)

For example,
{code:java}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY GROUPING SETS ((deptno), (*))
{code}
would return a row for each of the 14 employees, plus a total row for each of 
the 3 departments:
{noformat}
+--------+--------+----------+
| DEPTNO | ENAME  | SUM_SAL  |
+--------+--------+----------+
| 10     |        | 8750.00  |
| 10     | CLARK  | 2450.00  |
| 10     | KING   | 5000.00  |
| 10     | MILLER | 1300.00  |
| 20     |        | 10875.00 |
| 20     | ADAMS  | 1100.00  |
| 20     | FORD   | 3000.00  |
| 20     | JONES  | 2975.00  |
| 20     | SCOTT  | 3000.00  |
| 20     | SMITH  | 800.00   |
| 30     |        | 9400.00  |
| 30     | ALLEN  | 1600.00  |
| 30     | BLAKE  | 2850.00  |
| 30     | JAMES  | 950.00   |
| 30     | MARTIN | 1250.00  |
| 30     | TURNER | 1500.00  |
| 30     | WARD   | 1250.00  |
+--------+--------+----------+
17 rows selected (0.005 seconds)
{noformat}
Equivalent queries would be
{code:java}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY ROLLUP (deptno, *);
{code}
and
{code:java}
SELECT deptno, null AS ename, SUM(sal) AS sumSal
FROM emp
GROUP BY deptno
UNION ALL
SELECT deptno, ename, SUM(sal) OVER (ROWS CURRENT ROW)
FROM emp;
{code}
In this case, we know that {{empno}} is a primary key, and therefore we know 
that {{GROUPING SETS ((deptno), (deptno, empno)}} will not merge any rows. But 
in many cases the primary key is not known, does not exist, or the table 
actually contains duplicate rows. The {{*}} syntax allows users and tools to 
generate queries without having knowledge of primary keys.

Note that we use window aggregate syntax, {{{}SUM(sal) OVER (ROWS CURRENT 
ROW){}}}, to apply the {{SUM}} function to a single row (which of course 
returns the {{sal}} value). This approach can be applied to any aggregate 
function. Most aggregate functions have a simple identity when applied to a 
single row; for example {{COUNT}} returns 1.

{{GROUPING}} and {{GROUPING_ID}} should also work in this mode.

  was:
{{GROUPING SETS}} is a powerful and convenient syntax that allows a query to 
return results at varying levels of aggregation - in order words, to compute 
sub-totals and grand totals - but it does not currently allow the query to 
return unaggregated rows.

This case proposes to allow {{*}} (asterisk) as a column name in the {{GROUPING 
SETS}} sub-clause of {{GROUP BY}}. It would also allow {{*}} in {{ROLLUP}} and 
{{CUBE}} sub-clauses, which are syntactic sugar for {{GROUPING SETS}}.

(We use {{*}} by analogy with {{COUNT(*)}}, which means 'include all possible 
columns, including a hypothetical row identifier, so that each row is 
considered unique', rather than by analogy with {{SELECT *}}, which means 
'expand to all non-system columns declared in the table'.)

For example,
{code}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY GROUPING SETS ((deptno), (*))
{code}
would return a row for each of the 14 employees, plus a total row for each of 
the 3 departments:
{noformat}
+--------+--------+----------+
| DEPTNO | ENAME  | SUM_SAL  |
+--------+--------+----------+
| 10     |        | 8750.00  |
| 10     | CLARK  | 2450.00  |
| 10     | KING   | 5000.00  |
| 10     | MILLER | 1300.00  |
| 20     |        | 10875.00 |
| 20     | ADAMS  | 1100.00  |
| 20     | FORD   | 3000.00  |
| 20     | JONES  | 2975.00  |
| 20     | SCOTT  | 3000.00  |
| 20     | SMITH  | 800.00   |
| 30     |        | 9400.00  |
| 30     | ALLEN  | 1600.00  |
| 30     | BLAKE  | 2850.00  |
| 30     | JAMES  | 950.00   |
| 30     | MARTIN | 1250.00  |
| 30     | TURNER | 1500.00  |
| 30     | WARD   | 1250.00  |
+--------+--------+----------+
17 rows selected (0.005 seconds)
{code}

Equivalent queries would be
{code}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY ROLLUP (deptno, *);
{code}
and
{code}
SELECT deptno, null AS ename, SUM(sal) AS sumSal
FROM emp
GROUP BY deptno
UNION ALL
SELECT deptno, ename, sal
FROM emp;
{code}



> Add syntax to allow non-aggregated rows to be used in GROUPING SETS
> -------------------------------------------------------------------
>
>                 Key: CALCITE-6537
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6537
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Julian Hyde
>            Priority: Major
>
> {{GROUPING SETS}} is a powerful and convenient syntax that allows a query to 
> return results at varying levels of aggregation - in order words, to compute 
> sub-totals and grand totals - but it does not currently allow the query to 
> return unaggregated rows.
> This case proposes to allow {{*}} (asterisk, star) as a column name in the 
> {{GROUPING SETS}} sub-clause of {{{}GROUP BY{}}}. It would also allow {{*}} 
> in {{ROLLUP}} and {{CUBE}} sub-clauses, which are syntactic sugar for 
> {{{}GROUPING SETS{}}}.
> (We use {{*}} by analogy with {{{}COUNT(*){}}}, which means 'include all 
> possible columns, including a hypothetical row identifier, so that each row 
> is considered unique', rather than by analogy with {{{}SELECT *{}}}, which 
> means 'expand to all non-system columns declared in the table'.)
> For example,
> {code:java}
> SELECT deptno, ename, SUM(sal) AS sumSal
> FROM emp
> GROUP BY GROUPING SETS ((deptno), (*))
> {code}
> would return a row for each of the 14 employees, plus a total row for each of 
> the 3 departments:
> {noformat}
> +--------+--------+----------+
> | DEPTNO | ENAME  | SUM_SAL  |
> +--------+--------+----------+
> | 10     |        | 8750.00  |
> | 10     | CLARK  | 2450.00  |
> | 10     | KING   | 5000.00  |
> | 10     | MILLER | 1300.00  |
> | 20     |        | 10875.00 |
> | 20     | ADAMS  | 1100.00  |
> | 20     | FORD   | 3000.00  |
> | 20     | JONES  | 2975.00  |
> | 20     | SCOTT  | 3000.00  |
> | 20     | SMITH  | 800.00   |
> | 30     |        | 9400.00  |
> | 30     | ALLEN  | 1600.00  |
> | 30     | BLAKE  | 2850.00  |
> | 30     | JAMES  | 950.00   |
> | 30     | MARTIN | 1250.00  |
> | 30     | TURNER | 1500.00  |
> | 30     | WARD   | 1250.00  |
> +--------+--------+----------+
> 17 rows selected (0.005 seconds)
> {noformat}
> Equivalent queries would be
> {code:java}
> SELECT deptno, ename, SUM(sal) AS sumSal
> FROM emp
> GROUP BY ROLLUP (deptno, *);
> {code}
> and
> {code:java}
> SELECT deptno, null AS ename, SUM(sal) AS sumSal
> FROM emp
> GROUP BY deptno
> UNION ALL
> SELECT deptno, ename, SUM(sal) OVER (ROWS CURRENT ROW)
> FROM emp;
> {code}
> In this case, we know that {{empno}} is a primary key, and therefore we know 
> that {{GROUPING SETS ((deptno), (deptno, empno)}} will not merge any rows. 
> But in many cases the primary key is not known, does not exist, or the table 
> actually contains duplicate rows. The {{*}} syntax allows users and tools to 
> generate queries without having knowledge of primary keys.
> Note that we use window aggregate syntax, {{{}SUM(sal) OVER (ROWS CURRENT 
> ROW){}}}, to apply the {{SUM}} function to a single row (which of course 
> returns the {{sal}} value). This approach can be applied to any aggregate 
> function. Most aggregate functions have a simple identity when applied to a 
> single row; for example {{COUNT}} returns 1.
> {{GROUPING}} and {{GROUPING_ID}} should also work in this mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-6537) Add syntax to allow non-aggregated rows to be used in GROUPING SETS

Reply via email to