[
https://issues.apache.org/jira/browse/CALCITE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985168#comment-17985168
]
Mihai Budiu edited comment on CALCITE-7010 at 6/22/25 4:55 PM:
---------------------------------------------------------------
[~suibianwanwan33] I think this PR has actually introduced a regression.
Consider the following query using the foodmart database:
{code}
create view v as select deptno, (select count(*) from emp where deptno =
dept.deptno) as x from dept;
{code}
The plan before the decorrelator is:
{code}
LogicalProject(deptno=[$0], x=[$3]), id = 426
LogicalCorrelate(correlation=[$cor0], joinType=[left],
requiredColumns=[{0}]), id = 424
LogicalTableScan(table=[[schema, dept]]), id = 389
LogicalAggregate(group=[{}], EXPR$0=[COUNT()]), id = 422
LogicalFilter(condition=[=($8, $cor0.deptno)]), id = 420
LogicalTableScan(table=[[schema, emp]]), id = 391
{code}
The plan I get after the decorrelator is:
{code}
LogicalProject(deptno=[$0], x=[$3]), id = 436
LogicalProject(deptno=[$0], dname=[$1], loc=[$2], EXPR$0=[CASE(IS
NULL($4), 0:BIGINT, $4)]), id = 510
LogicalJoin(condition=[=($0, $3)], joinType=[left]), id = 508
LogicalTableScan(table=[[schema, dept]]), id = 389
LogicalProject(deptno=[$0], EXPR$0=[CASE(IS NOT NULL($2), $2, 0)]),
id = 506
LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[left]), id = 504
LogicalAggregate(group=[{0}]), id = 496
LogicalTableScan(table=[[schema, dept]]), id = 389
LogicalAggregate(group=[{0}], EXPR$0=[COUNT()]), id = 502
LogicalProject(deptno=[$8]), id = 500
LogicalFilter(condition=[IS NOT NULL($8)]), id = 498
LogicalTableScan(table=[[schema, emp]]), id = 391
{code}
This plan is joining by comparing dept with deptno (IS NOT DISTINCT FROM),
which doesn't make much sense.
was (Author: JIRAUSER295926):
[~suibianwanwan33] I think this PR has actually introduced a regression.
Consider the following query using the foodmart database:
{code}
create view v as select deptno, (select count(*) from emp where deptno =
dept.deptno) as x from dept;
{code}
The plan before the decorrelator is:
{code}
LogicalProject(deptno=[$0], x=[$3]), id = 426
LogicalCorrelate(correlation=[$cor0], joinType=[left],
requiredColumns=[{0}]), id = 424
LogicalTableScan(table=[[schema, dept]]), id = 389
LogicalAggregate(group=[{}], EXPR$0=[COUNT()]), id = 422
LogicalFilter(condition=[=($8, $cor0.deptno)]), id = 420
LogicalTableScan(table=[[schema, emp]]), id = 391
{code}
The plan I get after the decorrelator is:
{code}
LogicalProject(deptno=[$0], x=[$3]), id = 436
LogicalProject(deptno=[$0], dname=[$1], loc=[$2], EXPR$0=[CASE(IS
NULL($4), 0:BIGINT, $4)]), id = 510
LogicalJoin(condition=[=($0, $3)], joinType=[left]), id = 508
LogicalTableScan(table=[[schema, dept]]), id = 389
LogicalProject(deptno=[$0], EXPR$0=[CASE(IS NOT NULL($2), $2, 0)]),
id = 506
LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[left]), id = 504
LogicalAggregate(group=[{0}]), id = 496
LogicalTableScan(table=[[schema, dept]]), id = 389
LogicalAggregate(group=[{0}], EXPR$0=[COUNT()]), id = 502
LogicalProject(deptno=[$8]), id = 500
LogicalFilter(condition=[IS NOT NULL($8)]), id = 498
LogicalTableScan(table=[[schema, emp]]), id = 391
{code}
This plan is joining dept with deptno, which doesn't make much sense.
> The well-known count bug
> ------------------------
>
> Key: CALCITE-7010
> URL: https://issues.apache.org/jira/browse/CALCITE-7010
> Project: Calcite
> Issue Type: Bug
> Reporter: suibianwanwan
> Assignee: suibianwanwan
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.41.0
>
>
> What is the count-bug: [Optimization of Nested SQL Queries
> Revisited|https://dl.acm.org/doi/pdf/10.1145/38714.38723]
> {quote}The well-known "count-bug" is not specific to the count aggregate, and
> outer-join does not solve it. The anomaly can occur on any aggregate
> function; aggregates need modification to distiguish empty set from null
> values; and optimizing out the outerjoin depends on utilization context
> {quote}
> Test in sub-query.iq:
> {code:java}
> SELECT deptno
> FROM dept d
> WHERE 0 IN (
> SELECT COUNT(*)
> FROM emp e
> WHERE d.deptno = e.deptno
> );
> +--------+
> | DEPTNO |
> +--------+
> | 40 |
> +--------+
> (1 row)
> !ok
> SELECT deptno
> FROM dept d
> WHERE 'Regular' IN (
> SELECT CASE WHEN SUM(sal) > 10 then 'VIP' else 'Regular' END expr
> FROM emp e
> WHERE d.deptno = e.deptno
> );
> +--------+
> | DEPTNO |
> +--------+
> | 40 |
> +--------+
> (1 row)
> !ok
> {code}
> Actual results:
> {code:java}
> +--------+
> | DEPTNO |
> +--------+
> +--------+
> (0 rows)
> +--------+
> | DEPTNO |
> +--------+
> +--------+
> (0 rows)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)