[ 
https://issues.apache.org/jira/browse/HIVE-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ping Lu updated HIVE-13265:
---------------------------
    Attachment: explain2.txt
                execution2.txt
                execution1.txt
                explain1.txt

> Query consists of union all and mapjoin, throw Exception “Unable to 
> deserialize reduce input key”
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13265
>                 URL: https://issues.apache.org/jira/browse/HIVE-13265
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1
>         Environment: Hadoop2.4.0 Hive0.13.1
>            Reporter: Ping Lu
>         Attachments: execution1.txt, execution2.txt, explain1.txt, 
> explain2.txt
>
>
> Steps to reproduce
> Prepare: 
> create four test tables and load data 
>       create table tmp_test1(col1 string);
>       create table tmp_test2(col1 string);
>       create table tmp_test3(col1 string,col2 string) row format delimited 
> fields terminated by "\t";  
>       create table tmp_test4(col1 string);
> load data local inpath "test3" into table tmp_test1;  // 6 rows
> load data local inpath "test3" into table tmp_test2;  // 5 rows
> load data local inpath "test3" into table tmp_test3;  // 6 rows
> load data local inpath "test4" into table tmp_test4;  // 3000011 rows, 
> 26670421Byte(>25M)
> Query1: error encountered while executing
> set hive.auto.convert.join=true;
> select
>     sq.col1,
>     count(distinct sq.col2) num
> from(
>     select
>         col1,
>         null col2
>     from
>         tmp_test1
>     union all
>     select
>         col1,
>         null col2
>     from
>         tmp_test2
>     union all
>     select
>         col1,
>         col2
>     from
>         tmp_test3
> )sq --sq'size is far smaller than 25M
> join
>     tmp_test4 ta
> ON sq.col1 = ta.col1
> group by sq.col1;
>     when set hive.auto.convert.join to true, join was converted to MapJoin 
> and sq was chosen as the small table.
> Query2: SELECT query got correct result
> set hive.auto.convert.join=false;
> select
>     sq.col1,
>     count(distinct sq.col2) num
> from(
>     select
>         col1,
>         null col2
>     from
>         tmp_test1
>     union all
>     select
>         col1,
>         null col2
>     from
>         tmp_test2
>     union all
>     select
>         col1,
>         col2
>     from
>         tmp_test3
> )sq
> join
>     tmp_test4 ta
> ON sq.col1 = ta.col1
> group by sq.col1; 
> the execute plan for Query1 names explain1.txt .
> the hive execution logs for Query1: SELECT statement names execution1.txt .
> the execute plan for the Query2 names explain2.txt .
> the hive execution logs for Query2 names execution2.txt .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to