[ 
https://issues.apache.org/jira/browse/HIVE-29164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016761#comment-18016761
 ] 

Thomas Rebele commented on HIVE-29164:
--------------------------------------

I think I've found a related bug:

Using tables as defined in `ql/src/test/queries/clientpositive/antijoin.q`, 
except changing the definition of t2_n33 to:
{code:java}
create table t2_n33 as select cast(2*key as int) key, value from t1_n55 union 
select 10 key, null value;
{code}
There's a difference when executing the query
{code:java}
select a.key, a.value, ''+a.value+b.value from t1_n55 a left join t2_n33 b on 
a.key=b.key where b.value is null;
{code}
With HiveJoin, the result contains a row {{{}10 val_10 NULL{}}}, with 
HiveAntiJoin the row is missing.

> AntiJoin rule fails to apply with an extra constant in the SELECT clause
> ------------------------------------------------------------------------
>
>                 Key: HIVE-29164
>                 URL: https://issues.apache.org/jira/browse/HIVE-29164
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Thomas Rebele
>            Assignee: Thomas Rebele
>            Priority: Major
>
> Adding a constant to the SELECT clause prevents Hive from applying the 
> HiveAntiSemiJoinRule.
> Example:
> {code:java}
> CREATE EXTERNAL TABLE tab(fieldA string, fieldB string);
> explain cbo select t1.fieldA
> from tab t1
> left join tab t2
> on t2.fieldA=t1.fieldA
> where t2.fieldA is null;
> {code}
> results in a plan
> {code:java}
> HiveAntiJoin(condition=[=($1, $0)], joinType=[anti])
>   HiveProject(fielda=[$0])
>     HiveTableScan(table=[[default, tab]], table:alias=[t1])
>   HiveProject(fielda=[$0])
>     HiveFilter(condition=[IS NOT NULL($0)])
>       HiveTableScan(table=[[default, tab]], table:alias=[t2])
> {code}
> However, including a constant column "abc" in the result
> {code:java}
> CREATE EXTERNAL TABLE tab(fieldA string, fieldB string);
> explain cbo select t1.fieldA, "abc"
> from tab t1
> left join tab t2
> on t2.fieldA=t1.fieldA
> where t2.fieldA is null;
> {code}
> leads to a plan without an anti-join:
> {code:java}
> HiveProject(fielda=[$0], _o__c1=[_UTF-16LE'abc':VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE"])
>   HiveFilter(condition=[IS NULL($1)])
>     HiveJoin(condition=[=($1, $0)], joinType=[left], algorithm=[none], 
> cost=[not available])
>       HiveProject(fielda=[$0])
>         HiveTableScan(table=[[default, tab]], table:alias=[t1])
>       HiveProject(fielda=[$0])
>         HiveFilter(condition=[IS NOT NULL($0)])
>           HiveTableScan(table=[[default, tab]], table:alias=[t2])
> {code}
> The cause is a bug in 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil#hasAnyExpressionFromRightSide.
>  The condition {{rightBitmap.contains(inputBits)}} returns true if no field 
> from the RHS of the join is used at all.
> Thanks to [~Dayakar] and [~krisztiankasa] for finding and discussing the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to