[
https://issues.apache.org/jira/browse/CALCITE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde updated CALCITE-6537:
---------------------------------
Description:
{{GROUPING SETS}} is a powerful and convenient syntax that allows a query to
return results at varying levels of aggregation - in order words, to compute
sub-totals and grand totals - but it does not currently allow the query to
return unaggregated rows.
This case proposes to allow {{*}} (asterisk, star) as a column name in the
{{GROUPING SETS}} sub-clause of {{{}GROUP BY{}}}. It would also allow {{*}} in
{{ROLLUP}} and {{CUBE}} sub-clauses, which are syntactic sugar for {{{}GROUPING
SETS{}}}.
(We use {{*}} by analogy with {{{}COUNT(*){}}}, which means 'include all
possible columns, including a hypothetical row identifier, so that each row is
considered unique', rather than by analogy with {{{}SELECT *{}}}, which means
'expand to all non-system columns declared in the table'.)
For example,
{code:java}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY GROUPING SETS ((deptno), (*))
{code}
would return a row for each of the 14 employees, plus a total row for each of
the 3 departments:
{noformat}
+--------+--------+----------+
| DEPTNO | ENAME | SUM_SAL |
+--------+--------+----------+
| 10 | | 8750.00 |
| 10 | CLARK | 2450.00 |
| 10 | KING | 5000.00 |
| 10 | MILLER | 1300.00 |
| 20 | | 10875.00 |
| 20 | ADAMS | 1100.00 |
| 20 | FORD | 3000.00 |
| 20 | JONES | 2975.00 |
| 20 | SCOTT | 3000.00 |
| 20 | SMITH | 800.00 |
| 30 | | 9400.00 |
| 30 | ALLEN | 1600.00 |
| 30 | BLAKE | 2850.00 |
| 30 | JAMES | 950.00 |
| 30 | MARTIN | 1250.00 |
| 30 | TURNER | 1500.00 |
| 30 | WARD | 1250.00 |
+--------+--------+----------+
17 rows selected (0.005 seconds)
{noformat}
Equivalent queries would be
{code:java}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY ROLLUP (deptno, *);
{code}
and
{code:java}
SELECT deptno, null AS ename, SUM(sal) AS sumSal
FROM emp
GROUP BY deptno
UNION ALL
SELECT deptno, ename, SUM(sal) OVER (ROWS CURRENT ROW)
FROM emp;
{code}
In this case, we know that {{empno}} is a primary key, and therefore we know
that {{GROUPING SETS ((deptno), (deptno, empno)}} will not merge any rows. But
in many cases the primary key is not known, does not exist, or the table
actually contains duplicate rows. The {{*}} syntax allows users and tools to
generate queries without having knowledge of primary keys.
Note that we use window aggregate syntax, {{{}SUM(sal) OVER (ROWS CURRENT
ROW){}}}, to apply the {{SUM}} function to a single row (which of course
returns the {{sal}} value). This approach can be applied to any aggregate
function. Most aggregate functions have a simple identity when applied to a
single row; for example {{COUNT}} returns 1.
{{GROUPING}} and {{GROUPING_ID}} should also work in this mode.
was:
{{GROUPING SETS}} is a powerful and convenient syntax that allows a query to
return results at varying levels of aggregation - in order words, to compute
sub-totals and grand totals - but it does not currently allow the query to
return unaggregated rows.
This case proposes to allow {{*}} (asterisk) as a column name in the {{GROUPING
SETS}} sub-clause of {{GROUP BY}}. It would also allow {{*}} in {{ROLLUP}} and
{{CUBE}} sub-clauses, which are syntactic sugar for {{GROUPING SETS}}.
(We use {{*}} by analogy with {{COUNT(*)}}, which means 'include all possible
columns, including a hypothetical row identifier, so that each row is
considered unique', rather than by analogy with {{SELECT *}}, which means
'expand to all non-system columns declared in the table'.)
For example,
{code}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY GROUPING SETS ((deptno), (*))
{code}
would return a row for each of the 14 employees, plus a total row for each of
the 3 departments:
{noformat}
+--------+--------+----------+
| DEPTNO | ENAME | SUM_SAL |
+--------+--------+----------+
| 10 | | 8750.00 |
| 10 | CLARK | 2450.00 |
| 10 | KING | 5000.00 |
| 10 | MILLER | 1300.00 |
| 20 | | 10875.00 |
| 20 | ADAMS | 1100.00 |
| 20 | FORD | 3000.00 |
| 20 | JONES | 2975.00 |
| 20 | SCOTT | 3000.00 |
| 20 | SMITH | 800.00 |
| 30 | | 9400.00 |
| 30 | ALLEN | 1600.00 |
| 30 | BLAKE | 2850.00 |
| 30 | JAMES | 950.00 |
| 30 | MARTIN | 1250.00 |
| 30 | TURNER | 1500.00 |
| 30 | WARD | 1250.00 |
+--------+--------+----------+
17 rows selected (0.005 seconds)
{code}
Equivalent queries would be
{code}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY ROLLUP (deptno, *);
{code}
and
{code}
SELECT deptno, null AS ename, SUM(sal) AS sumSal
FROM emp
GROUP BY deptno
UNION ALL
SELECT deptno, ename, sal
FROM emp;
{code}
> Add syntax to allow non-aggregated rows to be used in GROUPING SETS
> -------------------------------------------------------------------
>
> Key: CALCITE-6537
> URL: https://issues.apache.org/jira/browse/CALCITE-6537
> Project: Calcite
> Issue Type: Improvement
> Reporter: Julian Hyde
> Priority: Major
>
> {{GROUPING SETS}} is a powerful and convenient syntax that allows a query to
> return results at varying levels of aggregation - in order words, to compute
> sub-totals and grand totals - but it does not currently allow the query to
> return unaggregated rows.
> This case proposes to allow {{*}} (asterisk, star) as a column name in the
> {{GROUPING SETS}} sub-clause of {{{}GROUP BY{}}}. It would also allow {{*}}
> in {{ROLLUP}} and {{CUBE}} sub-clauses, which are syntactic sugar for
> {{{}GROUPING SETS{}}}.
> (We use {{*}} by analogy with {{{}COUNT(*){}}}, which means 'include all
> possible columns, including a hypothetical row identifier, so that each row
> is considered unique', rather than by analogy with {{{}SELECT *{}}}, which
> means 'expand to all non-system columns declared in the table'.)
> For example,
> {code:java}
> SELECT deptno, ename, SUM(sal) AS sumSal
> FROM emp
> GROUP BY GROUPING SETS ((deptno), (*))
> {code}
> would return a row for each of the 14 employees, plus a total row for each of
> the 3 departments:
> {noformat}
> +--------+--------+----------+
> | DEPTNO | ENAME | SUM_SAL |
> +--------+--------+----------+
> | 10 | | 8750.00 |
> | 10 | CLARK | 2450.00 |
> | 10 | KING | 5000.00 |
> | 10 | MILLER | 1300.00 |
> | 20 | | 10875.00 |
> | 20 | ADAMS | 1100.00 |
> | 20 | FORD | 3000.00 |
> | 20 | JONES | 2975.00 |
> | 20 | SCOTT | 3000.00 |
> | 20 | SMITH | 800.00 |
> | 30 | | 9400.00 |
> | 30 | ALLEN | 1600.00 |
> | 30 | BLAKE | 2850.00 |
> | 30 | JAMES | 950.00 |
> | 30 | MARTIN | 1250.00 |
> | 30 | TURNER | 1500.00 |
> | 30 | WARD | 1250.00 |
> +--------+--------+----------+
> 17 rows selected (0.005 seconds)
> {noformat}
> Equivalent queries would be
> {code:java}
> SELECT deptno, ename, SUM(sal) AS sumSal
> FROM emp
> GROUP BY ROLLUP (deptno, *);
> {code}
> and
> {code:java}
> SELECT deptno, null AS ename, SUM(sal) AS sumSal
> FROM emp
> GROUP BY deptno
> UNION ALL
> SELECT deptno, ename, SUM(sal) OVER (ROWS CURRENT ROW)
> FROM emp;
> {code}
> In this case, we know that {{empno}} is a primary key, and therefore we know
> that {{GROUPING SETS ((deptno), (deptno, empno)}} will not merge any rows.
> But in many cases the primary key is not known, does not exist, or the table
> actually contains duplicate rows. The {{*}} syntax allows users and tools to
> generate queries without having knowledge of primary keys.
> Note that we use window aggregate syntax, {{{}SUM(sal) OVER (ROWS CURRENT
> ROW){}}}, to apply the {{SUM}} function to a single row (which of course
> returns the {{sal}} value). This approach can be applied to any aggregate
> function. Most aggregate functions have a simple identity when applied to a
> single row; for example {{COUNT}} returns 1.
> {{GROUPING}} and {{GROUPING_ID}} should also work in this mode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)