[ https://issues.apache.org/jira/browse/HIVE-29164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016761#comment-18016761 ]
Thomas Rebele commented on HIVE-29164: -------------------------------------- I think I've found a related bug: Using tables as defined in `ql/src/test/queries/clientpositive/antijoin.q`, except changing the definition of t2_n33 to: {code:java} create table t2_n33 as select cast(2*key as int) key, value from t1_n55 union select 10 key, null value; {code} There's a difference when executing the query {code:java} select a.key, a.value, ''+a.value+b.value from t1_n55 a left join t2_n33 b on a.key=b.key where b.value is null; {code} With HiveJoin, the result contains a row {{{}10 val_10 NULL{}}}, with HiveAntiJoin the row is missing. > AntiJoin rule fails to apply with an extra constant in the SELECT clause > ------------------------------------------------------------------------ > > Key: HIVE-29164 > URL: https://issues.apache.org/jira/browse/HIVE-29164 > Project: Hive > Issue Type: Bug > Reporter: Thomas Rebele > Assignee: Thomas Rebele > Priority: Major > > Adding a constant to the SELECT clause prevents Hive from applying the > HiveAntiSemiJoinRule. > Example: > {code:java} > CREATE EXTERNAL TABLE tab(fieldA string, fieldB string); > explain cbo select t1.fieldA > from tab t1 > left join tab t2 > on t2.fieldA=t1.fieldA > where t2.fieldA is null; > {code} > results in a plan > {code:java} > HiveAntiJoin(condition=[=($1, $0)], joinType=[anti]) > HiveProject(fielda=[$0]) > HiveTableScan(table=[[default, tab]], table:alias=[t1]) > HiveProject(fielda=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[default, tab]], table:alias=[t2]) > {code} > However, including a constant column "abc" in the result > {code:java} > CREATE EXTERNAL TABLE tab(fieldA string, fieldB string); > explain cbo select t1.fieldA, "abc" > from tab t1 > left join tab t2 > on t2.fieldA=t1.fieldA > where t2.fieldA is null; > {code} > leads to a plan without an anti-join: > {code:java} > HiveProject(fielda=[$0], _o__c1=[_UTF-16LE'abc':VARCHAR(2147483647) CHARACTER > SET "UTF-16LE"]) > HiveFilter(condition=[IS NULL($1)]) > HiveJoin(condition=[=($1, $0)], joinType=[left], algorithm=[none], > cost=[not available]) > HiveProject(fielda=[$0]) > HiveTableScan(table=[[default, tab]], table:alias=[t1]) > HiveProject(fielda=[$0]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[default, tab]], table:alias=[t2]) > {code} > The cause is a bug in > org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil#hasAnyExpressionFromRightSide. > The condition {{rightBitmap.contains(inputBits)}} returns true if no field > from the RHS of the join is used at all. > Thanks to [~Dayakar] and [~krisztiankasa] for finding and discussing the > issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)