[ https://issues.apache.org/jira/browse/HIVE-24999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez updated HIVE-24999: ------------------------------------------- Fix Version/s: 4.0.0 > HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple > correlations > ---------------------------------------------------------------------------------------- > > Key: HIVE-24999 > URL: https://issues.apache.org/jira/browse/HIVE-24999 > Project: Hive > Issue Type: Bug > Components: CBO > Reporter: Stamatis Zampetakis > Assignee: Stamatis Zampetakis > Priority: Major > Fix For: 4.0.0 > > > The problem can be reproduced by using the following query which at the > moment can be found in {{subquery_in.q}} file: > {code:sql} > explain cbo select * from part where p_name IN (select p_name from part p > where p.p_size = part.p_size AND part.p_size + 121150 = p.p_partkey ); > {code} > The plans before and after {{HiveSubQueryRemoveRule}} are shown below: > {noformat} > 2021-04-09T14:29:08,031 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] > parse.CalcitePlanner: Plan before removing subquery: > HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], > p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], > p_comment=[$8]) > HiveFilter(condition=[IN($1, { > HiveProject(p_name=[$1]) > HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), > $0))]) > HiveTableScan(table=[[default, part]], table:alias=[p]) > })]) > HiveTableScan(table=[[default, part]], table:alias=[part]) > 2021-04-09T14:29:08,056 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] > parse.CalcitePlanner: Plan just after removing subquery: > HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], > p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], > p_comment=[$8]) > HiveFilter(condition=[=($1, $12)]) > LogicalCorrelate(correlation=[$cor0], joinType=[semi], > requiredColumns=[{5}]) > HiveTableScan(table=[[default, part]], table:alias=[part]) > HiveProject(p_name=[$1]) > HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, > 121150), $0))]) > HiveTableScan(table=[[default, part]], table:alias=[p]) > {noformat} > The plan after applying the rule is invalid. The > {{HiveFilter(condition=[=($1, $12)])}} above the correlate references columns > ($12) from the right input which do not exist since the correlate is of type > SEMI. Running the test with {{-Dcalcite.debug}} property enabled raises an > {{AssertionError}} when building the {{HiveFilter}}. > The problem is hidden at the moment since there is a specific hack in > {{HiveRelDecorrelator}} that turns this invalid plan into a valid one. This > mechanism is very brittle and it can break easily as it happened while fixing > HIVE-24957. -- This message was sent by Atlassian Jira (v8.3.4#803005)