[
https://issues.apache.org/jira/browse/CALCITE-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012035#comment-18012035
]
Zhen Chen edited comment on CALCITE-7126 at 8/5/25 6:44 AM:
------------------------------------------------------------
I've roughly pinpointed the cause of the error. Please see the plan below,
which is the plan corresponding to the SQL in the summary.
{code:java}
EnumerableSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
dir2=[ASC])
EnumerableUnion(all=[true])
EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]],
EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)],
SAL_FLAG=[GROUPING($2)])
EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3],
COMM=[$t4])
EnumerableValues(tuples=[[{ 100, 10, 'Bill ', 10000, 1000, 'aa' },
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null,
'aa' }, { 200, 20, 'Eric ', 8000, 500, 'aa' }]])
EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}]], EXPR$3=[SUM($3)],
DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], SAL_FLAG=[GROUPING($2)])
EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3],
COMM=[$t4])
EnumerableValues(tuples=[[{ 100, 10, 'Bill ', 10000, 1000, 'aa' },
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null,
'aa' }, { 200, 20, 'Eric ', 8000, 500, 'aa' }]])
!plan
{code}
When repeated combinations occur in groupSets, method
rewriteAggregateWithDuplicateGroupSets is rewritten as a two-AGG UNION. During
the rewrite, the groupSet is not set correctly (for example, in the second
EnumerableAggregate, the groupSet is {0,1,2} and the groupSets is [{0,1}],
while the correct groupSet should be {0,1}). If we want to modify it correctly,
it will affect the PROEJCT above AGG . For example, in this example, the column
with index 2 needs to be added to the PROEJCT with a NULL value, and
GROUPING(sal) needs to be rewritten to 1 and added to the PROEJCT.
Roughly like this:
{code:java}
EnumerableProject(DEPTNO=[$0], JOB=[$1], SAL=[null:VARCHAR(10)],
DEPTNO_FLAG=[$3], JOB_FLAG=[$3], SAL_FLAG=[1:INTEGER])
EnumerableAggregate(group=[{0, 1}], groups=[[{0, 1}]], EXPR$3=[SUM($3)],
DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)])
{code}
I'm confused because similar logic was originally planned for CALCITE-7116, but
I have to implement it again here. Is there a better way?
was (Author: jensen):
I've roughly pinpointed the cause of the error. Please see the plan below,
which is the plan corresponding to the SQL in the summary.
{code:java}
EnumerableSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
dir2=[ASC])
EnumerableUnion(all=[true])
EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]],
EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)],
SAL_FLAG=[GROUPING($2)])
EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3],
COMM=[$t4])
EnumerableValues(tuples=[[{ 100, 10, 'Bill ', 10000, 1000, 'aa' },
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null,
'aa' }, { 200, 20, 'Eric ', 8000, 500, 'aa' }]])
EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}]], EXPR$3=[SUM($3)],
DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], SAL_FLAG=[GROUPING($2)])
EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3],
COMM=[$t4])
EnumerableValues(tuples=[[{ 100, 10, 'Bill ', 10000, 1000, 'aa' },
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null,
'aa' }, { 200, 20, 'Eric ', 8000, 500, 'aa' }]])
!plan
{code}
When repeated combinations occur in groupSets, method
rewriteAggregateWithDuplicateGroupSets is rewritten as a two-AGG UNION. During
the rewrite, the groupSet is not set correctly (for example, in the second
EnumerableAggregate, the groupSet is {0,1,2} and the groupSets is [{0,1}],
while the correct groupSet should be {0,1}). If we want to modify it correctly,
it will affect the PROEJCT above AGG . For example, in this example, the column
with index 2 needs to be added to the PROEJCT with a NULL value, and
GROUPING(sal) needs to be rewritten to 1 and added to the PROEJCT. I'm confused
because similar logic was originally planned for CALCITE-7116, but I have to
implement it again here. Is there a better way?
> The calculation result of grouping function is wrong
> ----------------------------------------------------
>
> Key: CALCITE-7126
> URL: https://issues.apache.org/jira/browse/CALCITE-7126
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.40.0
> Reporter: Zhen Chen
> Priority: Major
>
> The calculation result of grouping function is wrong.
> in agg.iq result is
> {code:java}
> +--------+-----+-------+--------+-------------+----------+----------+
> | DEPTNO | JOB | SAL | EXPR$3 | DEPTNO_FLAG | JOB_FLAG | SAL_FLAG |
> +--------+-----+-------+--------+-------------+----------+----------+
> | 10 | aa | | 1250 | 0 | 0 | 1 |
> | 10 | aa | | 1250 | 0 | 0 | 0 |
> | 10 | | 7000 | | 0 | 1 | 0 |
> | 10 | | 10000 | 1000 | 0 | 1 | 0 |
> | 10 | | 11500 | 250 | 0 | 1 | 0 |
> | 20 | aa | | 500 | 0 | 0 | 1 |
> | 20 | aa | | 500 | 0 | 0 | 0 |
> | 20 | | 8000 | 500 | 0 | 1 | 0 |
> +--------+-----+-------+--------+-------------+----------+----------+
> (8 rows)
> {code}
> in pgsql result is
> {code:java}
> deptno | job | sal | sum | deptno_flag | job_flag | sal_flag
> --------+-----+-------+------+-------------+----------+----------
> 10 | aa | | 1250 | 0 | 0 | 1
> 10 | aa | | 1250 | 0 | 0 | 1
> 10 | | 7000 | | 0 | 1 | 0
> 10 | | 10000 | 1000 | 0 | 1 | 0
> 10 | | 11500 | 250 | 0 | 1 | 0
> 20 | aa | | 500 | 0 | 0 | 1
> 20 | aa | | 500 | 0 | 0 | 1
> 20 | | 8000 | 500 | 0 | 1 | 0
> (8 rows)
> {code}
> The sal_flag column has incorrect values in rows 2 and 7.
> SQL is
> {code:java}
> WITH emps_data AS (
> SELECT * FROM (VALUES
> (100, 10, 'Bill', 10000, 1000, 'aa'),
> (110, 10, 'Theodore', 11500, 250, 'aa'),
> (150, 10, 'Sebastian', 7000, NULL, 'aa'),
> (200, 20, 'Eric', 8000, 500, 'aa')
> ) AS t(empno, deptno, ename, sal, comm, job)
> )
> SELECT deptno, job, sal, SUM(comm),
> GROUPING(deptno) AS deptno_flag,
> GROUPING(job) AS job_flag,
> GROUPING(sal) AS sal_flag
> FROM emps_data
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal), (deptno, job))
> ORDER BY deptno, job, sal;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)