Hi,all
   I have a  doubt:
my test hql is :
"select tmp.src_ip,c.to_ip from (select a.src_ip,b.appid from small_tbl a join 
im b on a.src_ip=b.src_ip) tmp join email c on tmp.appid=c.appid" , im and 
email are bigtables, 
set hive.execution.engine=mr, the execution plan generated two mapjoin stage, 
set  hive.execution.engine=spark,the execution plan generated one map join and 
one common join ,this is to say
"(select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip)" go 
mapjoin ,and its result "tmp" has only 10 item, BUT "tmp join email" cannot go 
mapjoin...... 
and I DEBUG the code,:


in hive-on-spark:
(1)(select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip) 
----------------->>>  MapWork.getMapredLocalWork() is OK,there is one 
MapRedLocalWork Object
(2) the result of the previous stage named ‘tmp’ join email, 
MapWork.getMapredLocalWork() is null.


Why hive on spark can not go mapjoin in this case, thankyou...





Reply via email to