[ 
https://issues.apache.org/jira/browse/CALCITE-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012936#comment-18012936
 ] 

Zhen Chen commented on CALCITE-7126:
------------------------------------

[~julianhyde] The core issue is indeed as described in the Jira summary. 
However, while fixing this problem, I encountered another issue (namely the 
so-called "allowChangeNullable" problem). I will try to explain this issue in 
detail and as clearly as possible. For example, In the original logic, 
"Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}, {}]])" would be expand to 
"Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])" and "Aggregate(group=[{0, 
1, 2}], groups=[[{}]])". The second AGG is incorrect, the correct one should be 
like "Aggregate(group=[{}])". So here comes the next problem. 

For a simple SQL (such as “SELECT SUM(a) FROM tbl”), the original logic would 
infer `SUM` as **nullable = TRUE**. However, when there is a **GROUP BY** 
(e.g., “SELECT SUM(a) FROM t GROUP BY b”), `SUM` remains inferred as **nullable 
= FALSE**. There is a special case involving operations like **ROLLUP** (i.e., 
`{0,1,2}, {0,1}, {0}, {}`), where the last GROUP SET is an empty. In this 
logic, when encountering grouping, all GROUP SETS need to be fully expanded. 
Just like this:

{code:java}
Union
  Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
  Aggregate(group=[{0, 1}], groups=[[{0, 1}]])
  Aggregate(group=[{0}], groups=[[{0}]])
  Aggregate(group=[{}]])
{code}

According to the original logic, if there is a `SUM` function in the final AGG, 
then it would infer `SUM` as **nullable = TRUE**. However, this behavior 
creates an inconsistency with the previous AGGs. Simply checking the "groupSet" 
property in the AGG is not sufficient to determine whether this AGG was 
generated during the rewrite process. Therefore, it is necessary to set a 
marker somewhere to indicate that the nullable property of SUM functions whose 
"groupSet" is 0 and are produced by the rewrite process should not be modified.

This is the problem that needs to be addressed throughout the entire process. 
Do you have any good suggestions for me?

> The calculation result of grouping function is wrong
> ----------------------------------------------------
>
>                 Key: CALCITE-7126
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7126
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.40.0
>            Reporter: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>
> The calculation result of grouping function is wrong.
> in agg.iq result is
> {code:java}
> +--------+-----+-------+--------+-------------+----------+----------+
> | DEPTNO | JOB | SAL   | EXPR$3 | DEPTNO_FLAG | JOB_FLAG | SAL_FLAG |
> +--------+-----+-------+--------+-------------+----------+----------+
> |     10 | aa  |       |   1250 |           0 |        0 |        1 |
> |     10 | aa  |       |   1250 |           0 |        0 |        0 |
> |     10 |     |  7000 |        |           0 |        1 |        0 |
> |     10 |     | 10000 |   1000 |           0 |        1 |        0 |
> |     10 |     | 11500 |    250 |           0 |        1 |        0 |
> |     20 | aa  |       |    500 |           0 |        0 |        1 |
> |     20 | aa  |       |    500 |           0 |        0 |        0 |
> |     20 |     |  8000 |    500 |           0 |        1 |        0 |
> +--------+-----+-------+--------+-------------+----------+----------+
> (8 rows)
> {code}
> in pgsql result is
> {code:java}
>  deptno | job |  sal  | sum  | deptno_flag | job_flag | sal_flag 
> --------+-----+-------+------+-------------+----------+----------
>      10 | aa  |       | 1250 |           0 |        0 |        1
>      10 | aa  |       | 1250 |           0 |        0 |        1
>      10 |     |  7000 |      |           0 |        1 |        0
>      10 |     | 10000 | 1000 |           0 |        1 |        0
>      10 |     | 11500 |  250 |           0 |        1 |        0
>      20 | aa  |       |  500 |           0 |        0 |        1
>      20 | aa  |       |  500 |           0 |        0 |        1
>      20 |     |  8000 |  500 |           0 |        1 |        0
> (8 rows)
> {code}
> The sal_flag column has incorrect values in rows 2 and 7.
> SQL is 
> {code:java}
> WITH emps_data AS (
>   SELECT * FROM (VALUES
>     (100, 10, 'Bill', 10000, 1000, 'aa'),
>     (110, 10, 'Theodore', 11500, 250, 'aa'),
>     (150, 10, 'Sebastian', 7000, NULL, 'aa'),
>     (200, 20, 'Eric', 8000, 500, 'aa')
>   ) AS t(empno, deptno, ename, sal, comm, job)
> )
> SELECT deptno, job, sal, SUM(comm),
>        GROUPING(deptno) AS deptno_flag,
>        GROUPING(job) AS job_flag,
>        GROUPING(sal) AS sal_flag
> FROM emps_data
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal), (deptno, job))
> ORDER BY deptno, job, sal;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to