Hi all: I'm currently testing hive11 and encounter one bug with hive.auto.convert.join, I construct a testcase so everyone can reproduce it(or you can reach the testcase here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
use test; create table src ( `key` int,`val` string); load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite into table src; drop table if exists orderpayment_small; create table orderpayment_small (`dealid` int,`date` string,`time` string, `cityid` int, `userid` int); insert overwrite table orderpayment_small select 748, '2011-03-24', '2011-03-24', 55 ,5372613 from src limit 1; drop table if exists user_small; create table user_small( userid int); insert overwrite table user_small select key from src limit 100; set hive.auto.convert.join.noconditionaltask.size = 200; SELECT `dim_pay_date`.`date` , `deal`.`dealid` FROM `orderpayment_small` `orderpayment` JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` = `orderpayment`.`date` JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid` JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = `orderpayment`.`cityid` JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid` limit 5; You should replace the path of kv1.txt by yourself. You can run the above query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You can see the explain result and the console output of the query here : https://gist.github.com/code6/6187569 I compile the trunk code but it doesn't work with this query. I can run this query in hive 0.9 with hive.auto.convert.join turns on. I try to dig into this problem and I think it may be caused by the map join optimization. Some adjacent operators aren't match for the input/output tableinfo(column positions diff). I'm not able to fix this bug and I would appreciate it if someone would like to look into this problem. Thanks.