[
https://issues.apache.org/jira/browse/CALCITE-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012936#comment-18012936
]
Zhen Chen edited comment on CALCITE-7126 at 8/9/25 12:01 AM:
-------------------------------------------------------------
[~julianhyde] The core issue is indeed as described in the Jira summary.
However, while fixing this problem, I encountered another issue (namely the
so-called "allowChangeNullable" problem). I will try to explain this issue in
detail and as clearly as possible. For example, In the original logic,
{code:java}
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}, {}]])
{code}
would be expand to
{code:java}
Union
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
Aggregate(group=[{0, 1, 2}], groups=[[{}]])
{code}
The second AGG is incorrect, the correct one should be like
{code:java}
Aggregate(group=[{}])
{code}
So here comes the next problem.
For a simple SQL (such as “SELECT SUM(a) FROM tbl”), the original logic would
infer `SUM` as nullable = TRUE. However, when there is a GROUP BY (e.g.,
“SELECT SUM(a) FROM t GROUP BY b”), SUM remains inferred as nullable = FALSE.
There is a special case involving operations like ROLLUP, where the last GROUP
SET is an empty. In this logic, when encountering grouping, all GROUP SETS need
to be fully expanded. Just like this:
{code:java}
Union
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
Aggregate(group=[{0, 1}], groups=[[{0, 1}]])
Aggregate(group=[{0}], groups=[[{0}]])
Aggregate(group=[{}]])
{code}
According to the original logic, if there is a SUM function in the final AGG,
then it would infer SUM as nullable = TRUE. However, this behavior creates an
inconsistency with the previous AGGs. Simply checking the "groupSet" property
in the AGG is not sufficient to determine whether this AGG was generated during
the rewrite process. Therefore, it is necessary to set a marker somewhere to
indicate that the nullable property of SUM functions whose "groupSet" is 0 and
are produced by the rewrite process should not be modified.
This is the problem that needs to be addressed throughout the entire process.
Do you have any good suggestions for me?
was (Author: jensen):
[~julianhyde] The core issue is indeed as described in the Jira summary.
However, while fixing this problem, I encountered another issue (namely the
so-called "allowChangeNullable" problem). I will try to explain this issue in
detail and as clearly as possible. For example, In the original logic,
{code:java}
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}, {}]])
{code}
would be expand to
{code:java}
Union
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
Aggregate(group=[{0, 1, 2}], groups=[[{}]])
{code}
The second AGG is incorrect, the correct one should be like
{code:java}
Aggregate(group=[{}])
{code}
So here comes the next problem.
For a simple SQL (such as “SELECT SUM(a) FROM tbl”), the original logic would
infer `SUM` as nullable = TRUE. However, when there is a GROUP BY (e.g.,
“SELECT SUM(a) FROM t GROUP BY b”), SUM remains inferred as nullable = FALSE.
There is a special case involving operations like ROLLUP (i.e., {0,1,2}, {0,1},
{0}, {}), where the last GROUP SET is an empty. In this logic, when
encountering grouping, all GROUP SETS need to be fully expanded. Just like this:
{code:java}
Union
Aggregate(group=[{0, 1, 2}], groups=[[{0, 1, 2}]])
Aggregate(group=[{0, 1}], groups=[[{0, 1}]])
Aggregate(group=[{0}], groups=[[{0}]])
Aggregate(group=[{}]])
{code}
According to the original logic, if there is a SUM function in the final AGG,
then it would infer SUM as nullable = TRUE. However, this behavior creates an
inconsistency with the previous AGGs. Simply checking the "groupSet" property
in the AGG is not sufficient to determine whether this AGG was generated during
the rewrite process. Therefore, it is necessary to set a marker somewhere to
indicate that the nullable property of SUM functions whose "groupSet" is 0 and
are produced by the rewrite process should not be modified.
This is the problem that needs to be addressed throughout the entire process.
Do you have any good suggestions for me?
> The calculation result of grouping function is wrong
> ----------------------------------------------------
>
> Key: CALCITE-7126
> URL: https://issues.apache.org/jira/browse/CALCITE-7126
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.40.0
> Reporter: Zhen Chen
> Priority: Major
> Labels: pull-request-available
>
> The calculation result of grouping function is wrong.
> in agg.iq result is
> {code:java}
> +--------+-----+-------+--------+-------------+----------+----------+
> | DEPTNO | JOB | SAL | EXPR$3 | DEPTNO_FLAG | JOB_FLAG | SAL_FLAG |
> +--------+-----+-------+--------+-------------+----------+----------+
> | 10 | aa | | 1250 | 0 | 0 | 1 |
> | 10 | aa | | 1250 | 0 | 0 | 0 |
> | 10 | | 7000 | | 0 | 1 | 0 |
> | 10 | | 10000 | 1000 | 0 | 1 | 0 |
> | 10 | | 11500 | 250 | 0 | 1 | 0 |
> | 20 | aa | | 500 | 0 | 0 | 1 |
> | 20 | aa | | 500 | 0 | 0 | 0 |
> | 20 | | 8000 | 500 | 0 | 1 | 0 |
> +--------+-----+-------+--------+-------------+----------+----------+
> (8 rows)
> {code}
> in pgsql result is
> {code:java}
> deptno | job | sal | sum | deptno_flag | job_flag | sal_flag
> --------+-----+-------+------+-------------+----------+----------
> 10 | aa | | 1250 | 0 | 0 | 1
> 10 | aa | | 1250 | 0 | 0 | 1
> 10 | | 7000 | | 0 | 1 | 0
> 10 | | 10000 | 1000 | 0 | 1 | 0
> 10 | | 11500 | 250 | 0 | 1 | 0
> 20 | aa | | 500 | 0 | 0 | 1
> 20 | aa | | 500 | 0 | 0 | 1
> 20 | | 8000 | 500 | 0 | 1 | 0
> (8 rows)
> {code}
> The sal_flag column has incorrect values in rows 2 and 7.
> SQL is
> {code:java}
> WITH emps_data AS (
> SELECT * FROM (VALUES
> (100, 10, 'Bill', 10000, 1000, 'aa'),
> (110, 10, 'Theodore', 11500, 250, 'aa'),
> (150, 10, 'Sebastian', 7000, NULL, 'aa'),
> (200, 20, 'Eric', 8000, 500, 'aa')
> ) AS t(empno, deptno, ename, sal, comm, job)
> )
> SELECT deptno, job, sal, SUM(comm),
> GROUPING(deptno) AS deptno_flag,
> GROUPING(job) AS job_flag,
> GROUPING(sal) AS sal_flag
> FROM emps_data
> GROUP BY GROUPING SETS ((deptno, job), (deptno, sal), (deptno, job))
> ORDER BY deptno, job, sal;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)