[ 
https://issues.apache.org/jira/browse/HIVE-24999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24999:
-------------------------------------------
    Fix Version/s: 4.0.0

> HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple 
> correlations
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-24999
>                 URL: https://issues.apache.org/jira/browse/HIVE-24999
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>             Fix For: 4.0.0
>
>
> The problem can be reproduced by using the following query which at the 
> moment can be found in {{subquery_in.q}} file:
> {code:sql}
> explain cbo select * from part where p_name IN (select p_name from part p 
> where p.p_size = part.p_size AND part.p_size + 121150 = p.p_partkey );
> {code}
> The plans before and after {{HiveSubQueryRemoveRule}} are shown below:
> {noformat}
> 2021-04-09T14:29:08,031 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] 
> parse.CalcitePlanner: Plan before removing subquery:
> HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
> p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], 
> p_comment=[$8])
>   HiveFilter(condition=[IN($1, {
> HiveProject(p_name=[$1])
>   HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), 
> $0))])
>     HiveTableScan(table=[[default, part]], table:alias=[p])
> })])
>     HiveTableScan(table=[[default, part]], table:alias=[part])
> 2021-04-09T14:29:08,056 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] 
> parse.CalcitePlanner: Plan just after removing subquery:
> HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
> p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], 
> p_comment=[$8])
>   HiveFilter(condition=[=($1, $12)])
>     LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{5}])
>       HiveTableScan(table=[[default, part]], table:alias=[part])
>       HiveProject(p_name=[$1])
>         HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 
> 121150), $0))])
>           HiveTableScan(table=[[default, part]], table:alias=[p])
> {noformat}
> The plan after applying the rule is invalid. The 
> {{HiveFilter(condition=[=($1, $12)])}} above the correlate references columns 
> ($12) from the right input which do not exist since the correlate is of type 
> SEMI. Running the test with {{-Dcalcite.debug}} property enabled raises an 
> {{AssertionError}} when building the {{HiveFilter}}.
> The problem is hidden at the moment since there is a specific hack in 
> {{HiveRelDecorrelator}} that turns this invalid plan into a valid one. This 
> mechanism is very brittle and it can break easily as it happened while fixing 
> HIVE-24957.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to