[jira] [Comment Edited] (CALCITE-7126) The calculation result of grouping function is wrong

Zhen Chen (Jira) Fri, 08 Aug 2025 17:03:51 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012936#comment-18012936
 ]


Zhen Chen edited comment on CALCITE-7126 at 8/9/25 12:01 AM:
-------------------------------------------------------------

[~julianhyde] The core issue is indeed as described in the Jira summary. 
However, while fixing this problem, I encountered another issue (namely the 
so-called "allowChangeNullable" problem). I will try to explain this issue in 
detail and as clearly as possible. For example, In the original logic, 

{code:java}
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}, {}]])
{code}

would be expand to
{code:java}
Union
  Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
  Aggregate(group=[{0, 1, 2}], groups=[[{}]])
{code}

The second AGG is incorrect, the correct one should be like

{code:java}
Aggregate(group=[{}])
{code}

So here comes the next problem. 

For a simple SQL (such as “SELECT SUM(a) FROM tbl”), the original logic would 
infer `SUM` as nullable = TRUE. However, when there is a GROUP BY (e.g., 
“SELECT SUM(a) FROM t GROUP BY b”), SUM remains inferred as nullable = FALSE. 
There is a special case involving operations like ROLLUP, where the last GROUP 
SET is an empty. In this logic, when encountering grouping, all GROUP SETS need 
to be fully expanded. Just like this:

{code:java}
Union
  Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
  Aggregate(group=[{0, 1}], groups=[[{0, 1}]])
  Aggregate(group=[{0}], groups=[[{0}]])
  Aggregate(group=[{}]])
{code}

According to the original logic, if there is a SUM function in the final AGG, 
then it would infer SUM as nullable = TRUE. However, this behavior creates an 
inconsistency with the previous AGGs. Simply checking the "groupSet" property 
in the AGG is not sufficient to determine whether this AGG was generated during 
the rewrite process. Therefore, it is necessary to set a marker somewhere to 
indicate that the nullable property of SUM functions whose "groupSet" is 0 and 
are produced by the rewrite process should not be modified.

This is the problem that needs to be addressed throughout the entire process. 
Do you have any good suggestions for me?


was (Author: jensen):
[~julianhyde] The core issue is indeed as described in the Jira summary. 
However, while fixing this problem, I encountered another issue (namely the 
so-called "allowChangeNullable" problem). I will try to explain this issue in 
detail and as clearly as possible. For example, In the original logic, 

{code:java}
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}, {}]])
{code}

would be expand to
{code:java}
Union
  Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
  Aggregate(group=[{0, 1, 2}], groups=[[{}]])
{code}

The second AGG is incorrect, the correct one should be like

{code:java}
Aggregate(group=[{}])
{code}

So here comes the next problem. 

For a simple SQL (such as “SELECT SUM(a) FROM tbl”), the original logic would 
infer `SUM` as nullable = TRUE. However, when there is a GROUP BY (e.g., 
“SELECT SUM(a) FROM t GROUP BY b”), SUM remains inferred as nullable = FALSE. 
There is a special case involving operations like ROLLUP (i.e., {0,1,2}, {0,1}, 
{0}, {}), where the last GROUP SET is an empty. In this logic, when 
encountering grouping, all GROUP SETS need to be fully expanded. Just like this:

{code:java}
Union
  Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
  Aggregate(group=[{0, 1}], groups=[[{0, 1}]])
  Aggregate(group=[{0}], groups=[[{0}]])
  Aggregate(group=[{}]])
{code}

According to the original logic, if there is a SUM function in the final AGG, 
then it would infer SUM as nullable = TRUE. However, this behavior creates an 
inconsistency with the previous AGGs. Simply checking the "groupSet" property 
in the AGG is not sufficient to determine whether this AGG was generated during 
the rewrite process. Therefore, it is necessary to set a marker somewhere to 
indicate that the nullable property of SUM functions whose "groupSet" is 0 and 
are produced by the rewrite process should not be modified.

This is the problem that needs to be addressed throughout the entire process. 
Do you have any good suggestions for me?

> The calculation result of grouping function is wrong
> ----------------------------------------------------
>
>                 Key: CALCITE-7126
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7126
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.40.0
>            Reporter: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>
> The calculation result of grouping function is wrong.
> in agg.iq result is
> {code:java}
> +--------+-----+-------+--------+-------------+----------+----------+
> | DEPTNO | JOB | SAL   | EXPR$3 | DEPTNO_FLAG | JOB_FLAG | SAL_FLAG |
> +--------+-----+-------+--------+-------------+----------+----------+
> |     10 | aa  |       |   1250 |           0 |        0 |        1 |
> |     10 | aa  |       |   1250 |           0 |        0 |        0 |
> |     10 |     |  7000 |        |           0 |        1 |        0 |
> |     10 |     | 10000 |   1000 |           0 |        1 |        0 |
> |     10 |     | 11500 |    250 |           0 |        1 |        0 |
> |     20 | aa  |       |    500 |           0 |        0 |        1 |
> |     20 | aa  |       |    500 |           0 |        0 |        0 |
> |     20 |     |  8000 |    500 |           0 |        1 |        0 |
> +--------+-----+-------+--------+-------------+----------+----------+
> (8 rows)
> {code}
> in pgsql result is
> {code:java}
>  deptno | job |  sal  | sum  | deptno_flag | job_flag | sal_flag 
> --------+-----+-------+------+-------------+----------+----------
>      10 | aa  |       | 1250 |           0 |        0 |        1
>      10 | aa  |       | 1250 |           0 |        0 |        1
>      10 |     |  7000 |      |           0 |        1 |        0
>      10 |     | 10000 | 1000 |           0 |        1 |        0
>      10 |     | 11500 |  250 |           0 |        1 |        0
>      20 | aa  |       |  500 |           0 |        0 |        1
>      20 | aa  |       |  500 |           0 |        0 |        1
>      20 |     |  8000 |  500 |           0 |        1 |        0
> (8 rows)
> {code}
> The sal_flag column has incorrect values in rows 2 and 7.
> SQL is 
> {code:java}
> WITH emps_data AS (
>   SELECT * FROM (VALUES
>     (100, 10, 'Bill', 10000, 1000, 'aa'),
>     (110, 10, 'Theodore', 11500, 250, 'aa'),
>     (150, 10, 'Sebastian', 7000, NULL, 'aa'),
>     (200, 20, 'Eric', 8000, 500, 'aa')
>   ) AS t(empno, deptno, ename, sal, comm, job)
> )
> SELECT deptno, job, sal, SUM(comm),
>        GROUPING(deptno) AS deptno_flag,
>        GROUPING(job) AS job_flag,
>        GROUPING(sal) AS sal_flag
> FROM emps_data
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal), (deptno, job))
> ORDER BY deptno, job, sal;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-7126) The calculation result of grouping function is wrong

Reply via email to