[jira] [Commented] (IMPALA-13483) Calcite Planner: some scalar subquery throws exception when handle single_value

Steve Carlin (Jira) Tue, 23 Sep 2025 12:45:22 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022254#comment-18022254
 ]


Steve Carlin commented on IMPALA-13483:
---------------------------------------

This is an interesting bug for sure.  My latest local version still has this 
problem. 

It is, definitely as you mention, an issue related to SINGLE_VALUE in the agg.  
There is no corresponding SINGLE_VALUE function in Impala right now.  

In this case, it seems that the SINGLE_VALUE is referencing the same field as 
the group.  I think we can fix this by adding a rule (or perhaps just a Shuttle 
since it only needs to happen once).

The rule/shuttle should look for when the SINGLE_VALUE function exists within 
the Aggregate.  One check on top of this is to see if the SINGLE_VALUE function 
is referenced by a parent RelNode.  If it isn't, then the aggregate can turn 
into a cardinality check node (which code already exists).  But if it is 
referenced, we should check that the underlying RelNode references the same 
field for the SINGLE_VALUE, as it does in this case here:
      LogicalAggregate(group=[\{0}], agg#0=[SINGLE_VALUE($1)]), id = 714
        LogicalProject(c11=[$0], C1=[$0]), id = 713
We can change the RelNode tree to something like the following, which should 
not have an impact on performance:


    LogicalProject(f1=[$0], f2=[$0])
      LogicalAggregate(group=[\{0}]), id = 714
        ...
It seems that this is actually more of a feature though than a bug, as this 
query is not supported in the current version of Impala.

> Calcite Planner: some scalar subquery throws exception when handle 
> single_value
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-13483
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13483
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: weihua zhang
>            Priority: Major
>
> {code:sql}
> create table correlated_scalar_t1(c1 bigint, c2 bigint);
> create table correlated_scalar_t2(c1 bigint, c2 bigint);
> insert into correlated_scalar_t1 values (1,null),(null,1),(1,2), 
> (null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), 
> (24,4),(null,null);
> insert into correlated_scalar_t2 values (1,null),(null,1),(1,4), (1,2), 
> (null,3), (2,4), (3,7), (3,9),(null,null),(5,1);
> select c1 from correlated_scalar_t1 where correlated_scalar_t1.c2 > (select 
> c1 from correlated_scalar_t2 where correlated_scalar_t1.c1 = 
> correlated_scalar_t2.c1 and correlated_scalar_t2.c2 < 4) order by c1;{code}
> {code:java}
> LogicalSort(sort0=[$0], dir0=[ASC]), id = 717
>   LogicalProject(C1=[$0]), id = 716
>     LogicalJoin(condition=[AND(=($0, $2), >($1, $3))], joinType=[inner]), id 
> = 715
>       LogicalTableScan(table=[[default, correlated_scalar_t1]]), id = 547
>       LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)]), id = 714
>         LogicalProject(c11=[$0], C1=[$0]), id = 713
>           LogicalFilter(condition=[AND(<($1, 4), IS NOT NULL($0))]), id = 712
>             LogicalTableScan(table=[[default, correlated_scalar_t2]]), id = 
> 549
> {code}
> Exception: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
> may be related to single_value
> hive plan:
> {code:java}
> explain cbo select c1 from correlated_scalar_t1 where correlated_scalar_t1.c2 
> > (select c1 from correlated_scalar_t2 where correlated_scalar_t1.c1 = 
> correlated_scalar_t2.c1 and correlated_scalar_t2.c2 < 4) order by c1;
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> | CBO PLAN:                                          |
> | HiveSortLimit(sort0=[$0], dir0=[ASC])              |
> |   HiveProject(c1=[$0])                             |
> |     HiveJoin(condition=[AND(=($0, $4), >($1, $3))], joinType=[inner], 
> algorithm=[none], cost=[not available]) |
> |       HiveJoin(condition=[=($0, $2)], joinType=[left], algorithm=[none], 
> cost=[not available]) |
> |         HiveProject(c1=[$0], c2=[$1])              |
> |           HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))]) |
> |             HiveTableScan(table=[[default, correlated_scalar_t1]], 
> table:alias=[correlated_scalar_t1]) |
> |         HiveProject(c10=[$0])                      |
> |           HiveFilter(condition=[sq_count_check($1)]) |
> |             HiveAggregate(group=[{0}], cnt=[COUNT()]) |
> |               HiveFilter(condition=[AND(<($1, 4), IS NOT NULL($0))]) |
> |                 HiveTableScan(table=[[default, correlated_scalar_t2]], 
> table:alias=[correlated_scalar_t2]) |
> |       HiveProject(c1=[$0], c10=[$0])               |
> |         HiveFilter(condition=[AND(<($1, 4), IS NOT NULL($0))]) |
> |           HiveTableScan(table=[[default, correlated_scalar_t2]], 
> table:alias=[correlated_scalar_t2]) |
> |                                                    |
> +----------------------------------------------------+
> 17 rows selected (0.935 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-13483) Calcite Planner: some scalar subquery throws exception when handle single_value

Reply via email to