[ 
https://issues.apache.org/jira/browse/HIVE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16698:
----------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0
           Status: Resolved  (was: Patch Available)

Test failure unrelated. Committed to master. Thanks Xuefu for the review!

> HoS should avoid mapjoin optimization in case of union and using table stats
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-16698
>                 URL: https://issues.apache.org/jira/browse/HIVE-16698
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Spark
>    Affects Versions: 3.0.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16698.1.patch
>
>
> When {{hive.spark.use.ts.stats.for.mapjoin}} is true, HoS would not check 
> whether the big table branch has upstream UNION operators. This is wrong and 
> could generate incorrect plan. To reproduce:
> {code}
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=16;
> set hive.spark.use.ts.stats.for.mapjoin=true;
> create table a (c1 string, c2 int);
> create table b (c3 string, c4 int);
> create table c (c1 string, c2 int);
> create table d (c3 string, c4 int);
> create table e (c5 string, c6 int);
> insert into table a values
> ("a1", 1), ("a2", 2), ("a3", 3), ("a4", 4), ("a5", 5), ("a6", 6), ("a7", 7);
> insert into table b values
> ("b1", 1), ("b2", 2), ("b3", 3), ("b4", 4);
> insert into table c values
> ("c1", 1), ("c2", 2), ("c3", 3), ("c4", 4), ("c5", 5), ("c6", 6), ("c7", 7);
> insert into table d values
> ("d1", 1), ("d2", 2), ("d3", 3), ("d4", 4);
> insert into table e values
> ("d1", 1), ("d2", 2);
> explain
> with t1 as (
> select a.c1 as c1, a.c2 as c2, b.c3 as c3 from a join b on a.c2 = b.c4
> ),
> t2 as (
> select c.c1 as c1, c.c2 as c2, d.c3 as c3 from c join d on c.c2 = d.c4
> ),
> t3 as (
> select * from t1 union all select * from t2
> ),
> t4 as (
> select t3.c1, t3.c3, t5.c5 from t3 join e as t5 on t3.c2 = t5.c6
> )
> select * from t4;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to