[jira] [Comment Edited] (CALCITE-7010) The well-known count bug

Mihai Budiu (Jira) Sun, 22 Jun 2025 09:57:18 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985168#comment-17985168
 ]


Mihai Budiu edited comment on CALCITE-7010 at 6/22/25 4:55 PM:
---------------------------------------------------------------

[~suibianwanwan33] I think this PR has actually introduced a regression.
Consider the following query using the foodmart database:

{code}
create view v as select deptno, (select count(*) from emp where deptno = 
dept.deptno) as x from dept;
{code}

The plan before the decorrelator is:

{code}
    LogicalProject(deptno=[$0], x=[$3]), id = 426
      LogicalCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{0}]), id = 424
        LogicalTableScan(table=[[schema, dept]]), id = 389
        LogicalAggregate(group=[{}], EXPR$0=[COUNT()]), id = 422
          LogicalFilter(condition=[=($8, $cor0.deptno)]), id = 420
            LogicalTableScan(table=[[schema, emp]]), id = 391
{code}

The plan I get after the decorrelator is:

{code}
    LogicalProject(deptno=[$0], x=[$3]), id = 436
      LogicalProject(deptno=[$0], dname=[$1], loc=[$2], EXPR$0=[CASE(IS 
NULL($4), 0:BIGINT, $4)]), id = 510
        LogicalJoin(condition=[=($0, $3)], joinType=[left]), id = 508
          LogicalTableScan(table=[[schema, dept]]), id = 389
          LogicalProject(deptno=[$0], EXPR$0=[CASE(IS NOT NULL($2), $2, 0)]), 
id = 506
            LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
joinType=[left]), id = 504
              LogicalAggregate(group=[{0}]), id = 496
                LogicalTableScan(table=[[schema, dept]]), id = 389
              LogicalAggregate(group=[{0}], EXPR$0=[COUNT()]), id = 502
                LogicalProject(deptno=[$8]), id = 500
                  LogicalFilter(condition=[IS NOT NULL($8)]), id = 498
                    LogicalTableScan(table=[[schema, emp]]), id = 391
{code}

This plan is joining by comparing dept with deptno (IS NOT DISTINCT FROM), 
which doesn't make much sense.


was (Author: JIRAUSER295926):
[~suibianwanwan33] I think this PR has actually introduced a regression.
Consider the following query using the foodmart database:

{code}
create view v as select deptno, (select count(*) from emp where deptno = 
dept.deptno) as x from dept;
{code}

The plan before the decorrelator is:

{code}
    LogicalProject(deptno=[$0], x=[$3]), id = 426
      LogicalCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{0}]), id = 424
        LogicalTableScan(table=[[schema, dept]]), id = 389
        LogicalAggregate(group=[{}], EXPR$0=[COUNT()]), id = 422
          LogicalFilter(condition=[=($8, $cor0.deptno)]), id = 420
            LogicalTableScan(table=[[schema, emp]]), id = 391
{code}

The plan I get after the decorrelator is:

{code}
    LogicalProject(deptno=[$0], x=[$3]), id = 436
      LogicalProject(deptno=[$0], dname=[$1], loc=[$2], EXPR$0=[CASE(IS 
NULL($4), 0:BIGINT, $4)]), id = 510
        LogicalJoin(condition=[=($0, $3)], joinType=[left]), id = 508
          LogicalTableScan(table=[[schema, dept]]), id = 389
          LogicalProject(deptno=[$0], EXPR$0=[CASE(IS NOT NULL($2), $2, 0)]), 
id = 506
            LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
joinType=[left]), id = 504
              LogicalAggregate(group=[{0}]), id = 496
                LogicalTableScan(table=[[schema, dept]]), id = 389
              LogicalAggregate(group=[{0}], EXPR$0=[COUNT()]), id = 502
                LogicalProject(deptno=[$8]), id = 500
                  LogicalFilter(condition=[IS NOT NULL($8)]), id = 498
                    LogicalTableScan(table=[[schema, emp]]), id = 391
{code}

This plan is joining dept with deptno, which doesn't make much sense.

> The well-known count bug
> ------------------------
>
>                 Key: CALCITE-7010
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7010
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: suibianwanwan
>            Assignee: suibianwanwan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.41.0
>
>
> What is the count-bug: [Optimization of Nested SQL Queries 
> Revisited|https://dl.acm.org/doi/pdf/10.1145/38714.38723]
> {quote}The well-known "count-bug" is not specific to the count aggregate, and 
> outer-join does not solve it. The anomaly can occur on any aggregate 
> function; aggregates need modification to distiguish empty set from null 
> values; and optimizing out the outerjoin depends on utilization context
> {quote}
> Test in sub-query.iq:
> {code:java}
> SELECT deptno
> FROM dept d
> WHERE 0 IN (
>     SELECT COUNT(*)
>     FROM emp e
>     WHERE d.deptno = e.deptno
> );
> +--------+
> | DEPTNO |
> +--------+
> |     40 |
> +--------+
> (1 row)
> !ok
> SELECT deptno
> FROM dept d
> WHERE 'Regular' IN (
>     SELECT CASE WHEN SUM(sal) > 10 then 'VIP' else 'Regular' END expr
>     FROM emp e
>     WHERE d.deptno = e.deptno
> );
> +--------+
> | DEPTNO |
> +--------+
> |     40 |
> +--------+
> (1 row)
> !ok
> {code}
> Actual results:
> {code:java}
> +--------+
> | DEPTNO |
> +--------+
> +--------+
> (0 rows)
> +--------+
> | DEPTNO |
> +--------+
> +--------+
> (0 rows)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-7010) The well-known count bug

Reply via email to