[jira] [Comment Edited] (CALCITE-7126) The calculation result of grouping function is wrong

Zhen Chen (Jira) Tue, 05 Aug 2025 00:03:12 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012035#comment-18012035
 ]


Zhen Chen edited comment on CALCITE-7126 at 8/5/25 6:44 AM:
------------------------------------------------------------

I've roughly pinpointed the cause of the error. Please see the plan below, 
which is the plan corresponding to the SQL in the summary.

{code:java}
EnumerableSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], 
dir2=[ASC])
  EnumerableUnion(all=[true])
    EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]], 
EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], 
SAL_FLAG=[GROUPING($2)])
      EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3], 
COMM=[$t4])
        EnumerableValues(tuples=[[{ 100, 10, 'Bill     ', 10000, 1000, 'aa' }, 
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null, 
'aa' }, { 200, 20, 'Eric     ', 8000, 500, 'aa' }]])
    EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}]], EXPR$3=[SUM($3)], 
DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], SAL_FLAG=[GROUPING($2)])
      EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3], 
COMM=[$t4])
        EnumerableValues(tuples=[[{ 100, 10, 'Bill     ', 10000, 1000, 'aa' }, 
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null, 
'aa' }, { 200, 20, 'Eric     ', 8000, 500, 'aa' }]])
!plan
{code}

When repeated combinations occur in groupSets, method 
rewriteAggregateWithDuplicateGroupSets is rewritten as a two-AGG UNION. During 
the rewrite, the groupSet is not set correctly (for example, in the second 
EnumerableAggregate, the groupSet is {0,1,2} and the groupSets is [{0,1}], 
while the correct groupSet should be {0,1}). If we want to modify it correctly, 
it will affect the PROEJCT above AGG . For example, in this example, the column 
with index 2 needs to be added to the PROEJCT with a NULL value, and 
GROUPING(sal) needs to be rewritten to 1 and added to the PROEJCT. 

Roughly like this:

{code:java}
EnumerableProject(DEPTNO=[$0], JOB=[$1], SAL=[null:VARCHAR(10)], 
DEPTNO_FLAG=[$3], JOB_FLAG=[$3], SAL_FLAG=[1:INTEGER])
  EnumerableAggregate(group=[{0, 1}], groups=[[{0, 1}]], EXPR$3=[SUM($3)], 
DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)])
{code}


I'm confused because similar logic was originally planned for CALCITE-7116, but 
I have to implement it again here. Is there a better way?


was (Author: jensen):
I've roughly pinpointed the cause of the error. Please see the plan below, 
which is the plan corresponding to the SQL in the summary.

{code:java}
EnumerableSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], 
dir2=[ASC])
  EnumerableUnion(all=[true])
    EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}, {0, 2}]], 
EXPR$3=[SUM($3)], DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], 
SAL_FLAG=[GROUPING($2)])
      EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3], 
COMM=[$t4])
        EnumerableValues(tuples=[[{ 100, 10, 'Bill     ', 10000, 1000, 'aa' }, 
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null, 
'aa' }, { 200, 20, 'Eric     ', 8000, 500, 'aa' }]])
    EnumerableAggregate(group=[{0, 1, 2}], groups=[[{0, 1}]], EXPR$3=[SUM($3)], 
DEPTNO_FLAG=[GROUPING($0)], JOB_FLAG=[GROUPING($1)], SAL_FLAG=[GROUPING($2)])
      EnumerableCalc(expr#0..5=[{inputs}], DEPTNO=[$t1], JOB=[$t5], SAL=[$t3], 
COMM=[$t4])
        EnumerableValues(tuples=[[{ 100, 10, 'Bill     ', 10000, 1000, 'aa' }, 
{ 110, 10, 'Theodore ', 11500, 250, 'aa' }, { 150, 10, 'Sebastian', 7000, null, 
'aa' }, { 200, 20, 'Eric     ', 8000, 500, 'aa' }]])
!plan
{code}

When repeated combinations occur in groupSets, method 
rewriteAggregateWithDuplicateGroupSets is rewritten as a two-AGG UNION. During 
the rewrite, the groupSet is not set correctly (for example, in the second 
EnumerableAggregate, the groupSet is {0,1,2} and the groupSets is [{0,1}], 
while the correct groupSet should be {0,1}). If we want to modify it correctly, 
it will affect the PROEJCT above AGG . For example, in this example, the column 
with index 2 needs to be added to the PROEJCT with a NULL value, and 
GROUPING(sal) needs to be rewritten to 1 and added to the PROEJCT. I'm confused 
because similar logic was originally planned for CALCITE-7116, but I have to 
implement it again here. Is there a better way?

> The calculation result of grouping function is wrong
> ----------------------------------------------------
>
>                 Key: CALCITE-7126
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7126
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.40.0
>            Reporter: Zhen Chen
>            Priority: Major
>
> The calculation result of grouping function is wrong.
> in agg.iq result is
> {code:java}
> +--------+-----+-------+--------+-------------+----------+----------+
> | DEPTNO | JOB | SAL   | EXPR$3 | DEPTNO_FLAG | JOB_FLAG | SAL_FLAG |
> +--------+-----+-------+--------+-------------+----------+----------+
> |     10 | aa  |       |   1250 |           0 |        0 |        1 |
> |     10 | aa  |       |   1250 |           0 |        0 |        0 |
> |     10 |     |  7000 |        |           0 |        1 |        0 |
> |     10 |     | 10000 |   1000 |           0 |        1 |        0 |
> |     10 |     | 11500 |    250 |           0 |        1 |        0 |
> |     20 | aa  |       |    500 |           0 |        0 |        1 |
> |     20 | aa  |       |    500 |           0 |        0 |        0 |
> |     20 |     |  8000 |    500 |           0 |        1 |        0 |
> +--------+-----+-------+--------+-------------+----------+----------+
> (8 rows)
> {code}
> in pgsql result is
> {code:java}
>  deptno | job |  sal  | sum  | deptno_flag | job_flag | sal_flag 
> --------+-----+-------+------+-------------+----------+----------
>      10 | aa  |       | 1250 |           0 |        0 |        1
>      10 | aa  |       | 1250 |           0 |        0 |        1
>      10 |     |  7000 |      |           0 |        1 |        0
>      10 |     | 10000 | 1000 |           0 |        1 |        0
>      10 |     | 11500 |  250 |           0 |        1 |        0
>      20 | aa  |       |  500 |           0 |        0 |        1
>      20 | aa  |       |  500 |           0 |        0 |        1
>      20 |     |  8000 |  500 |           0 |        1 |        0
> (8 rows)
> {code}
> The sal_flag column has incorrect values in rows 2 and 7.
> SQL is 
> {code:java}
> WITH emps_data AS (
>   SELECT * FROM (VALUES
>     (100, 10, 'Bill', 10000, 1000, 'aa'),
>     (110, 10, 'Theodore', 11500, 250, 'aa'),
>     (150, 10, 'Sebastian', 7000, NULL, 'aa'),
>     (200, 20, 'Eric', 8000, 500, 'aa')
>   ) AS t(empno, deptno, ename, sal, comm, job)
> )
> SELECT deptno, job, sal, SUM(comm),
>        GROUPING(deptno) AS deptno_flag,
>        GROUPING(job) AS job_flag,
>        GROUPING(sal) AS sal_flag
> FROM emps_data
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal), (deptno, job))
> ORDER BY deptno, job, sal;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-7126) The calculation result of grouping function is wrong

Reply via email to