[ https://issues.apache.org/jira/browse/HIVE-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ping Lu updated HIVE-13265: --------------------------- Attachment: explain2.txt execution2.txt execution1.txt explain1.txt > Query consists of union all and mapjoin, throw Exception “Unable to > deserialize reduce input key” > ------------------------------------------------------------------------------------------------- > > Key: HIVE-13265 > URL: https://issues.apache.org/jira/browse/HIVE-13265 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.1 > Environment: Hadoop2.4.0 Hive0.13.1 > Reporter: Ping Lu > Attachments: execution1.txt, execution2.txt, explain1.txt, > explain2.txt > > > Steps to reproduce > Prepare: > create four test tables and load data > create table tmp_test1(col1 string); > create table tmp_test2(col1 string); > create table tmp_test3(col1 string,col2 string) row format delimited > fields terminated by "\t"; > create table tmp_test4(col1 string); > load data local inpath "test3" into table tmp_test1; // 6 rows > load data local inpath "test3" into table tmp_test2; // 5 rows > load data local inpath "test3" into table tmp_test3; // 6 rows > load data local inpath "test4" into table tmp_test4; // 3000011 rows, > 26670421Byte(>25M) > Query1: error encountered while executing > set hive.auto.convert.join=true; > select > sq.col1, > count(distinct sq.col2) num > from( > select > col1, > null col2 > from > tmp_test1 > union all > select > col1, > null col2 > from > tmp_test2 > union all > select > col1, > col2 > from > tmp_test3 > )sq --sq'size is far smaller than 25M > join > tmp_test4 ta > ON sq.col1 = ta.col1 > group by sq.col1; > when set hive.auto.convert.join to true, join was converted to MapJoin > and sq was chosen as the small table. > Query2: SELECT query got correct result > set hive.auto.convert.join=false; > select > sq.col1, > count(distinct sq.col2) num > from( > select > col1, > null col2 > from > tmp_test1 > union all > select > col1, > null col2 > from > tmp_test2 > union all > select > col1, > col2 > from > tmp_test3 > )sq > join > tmp_test4 ta > ON sq.col1 = ta.col1 > group by sq.col1; > the execute plan for Query1 names explain1.txt . > the hive execution logs for Query1: SELECT statement names execution1.txt . > the execute plan for the Query2 names explain2.txt . > the hive execution logs for Query2 names execution2.txt . -- This message was sent by Atlassian JIRA (v6.3.4#6332)