[ https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311521#comment-17311521 ]
Stamatis Zampetakis commented on HIVE-24957: -------------------------------------------- The problem lies in the query plan and more specifically in the {{HiveRelDecorrelator}}. {noformat} 2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] rules.RelFieldTrimmer: Plan after trimming unused fields HiveProject(b_title=[$0]) HiveFilter(condition=[EXISTS({ HiveProject(_o__c0=[1]) HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), $cor0.b_authorkey, 300), $0)]) HiveTableScan(table=[[default, author]], table:alias=[a]) })]) HiveProject(b_title=[$1], b_authorkey=[$2]) HiveTableScan(table=[[default, book]], table:alias=[b]) 2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] parse.CalcitePlanner: Plan before removing subquery: HiveProject(b_title=[$1]) HiveFilter(condition=[EXISTS({ HiveProject(_o__c0=[1]) HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), $cor0.b_authorkey, 300), $0)]) HiveTableScan(table=[[default, author]], table:alias=[a]) })]) HiveTableScan(table=[[default, book]], table:alias=[b]) 2021-03-30T06:07:50,280 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] parse.CalcitePlanner: Plan just after removing subquery: HiveProject(b_title=[$1]) LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}]) HiveTableScan(table=[[default, book]], table:alias=[b]) HiveProject(literalTrue=[true]) HiveProject(_o__c0=[1]) HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), $cor0.b_authorkey, 300), $0)]) HiveTableScan(table=[[default, author]], table:alias=[a]) 2021-03-30T06:07:50,282 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] parse.CalcitePlanner: Plan after decorrelation: HiveProject(b_title=[$1]) HiveSemiJoin(condition=[=($8, $2)], joinType=[semi]) HiveTableScan(table=[[default, book]], table:alias=[b]) HiveProject(literalTrue=[true], b_authorkey=[$1]) HiveProject(_o__c0=[1], b_authorkey=[$6]) HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)], joinType=[inner], algorithm=[none], cost=[not available]) HiveTableScan(table=[[default, author]], table:alias=[a]) HiveAggregate(group=[{0}]) HiveProject(b_authorkey=[$2]) HiveTableScan(table=[[default, book]], table:alias=[b]) {noformat} The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that books with NULL {{b_authorkey}} are removed from the result set. > Wrong results when subquery has COALESCE in correlation predicate > ----------------------------------------------------------------- > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Stamatis Zampetakis > Assignee: Stamatis Zampetakis > Priority: Major > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey > from author a > where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)