Hi all: 
I'm currently testing hive11 and encounter one bug with hive.auto.convert.join, 
I construct a testcase so everyone can reproduce it(or you can reach the 
testcase 
here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):

use test;
create table src ( `key` int,`val` string);
load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite 
into table src;
drop table if exists orderpayment_small;
create table orderpayment_small (`dealid` int,`date` string,`time` string, 
`cityid` int, `userid` int);
insert overwrite table orderpayment_small select 748, '2011-03-24', 
'2011-03-24', 55 ,5372613 from src limit 1;
drop table if exists user_small;
create table user_small( userid int);
insert overwrite table user_small select key from src limit 100;
set hive.auto.convert.join.noconditionaltask.size = 200;
SELECT
`dim_pay_date`.`date`
, `deal`.`dealid`
FROM `orderpayment_small` `orderpayment`
JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` = 
`orderpayment`.`date`
JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = 
`orderpayment`.`cityid`
JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
limit 5;


You should replace the path of kv1.txt by yourself. You can run the above query 
in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You can see 
the explain result and the console output of the query here : 
https://gist.github.com/code6/6187569

I compile the trunk code but it doesn't work with this query. I can run this 
query in hive 0.9 with hive.auto.convert.join turns on.

I try to dig into this problem and I think it may be caused by the map join 
optimization. Some adjacent operators aren't match for the input/output 
tableinfo(column positions diff). 

I'm not able to fix this bug and I would appreciate it if someone would like to 
look into this problem.

Thanks.

Reply via email to