Hi,all I have a doubt: my test hql is : "select tmp.src_ip,c.to_ip from (select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip) tmp join email c on tmp.appid=c.appid" , im and email are bigtables, set hive.execution.engine=mr, the execution plan generated two mapjoin stage, set hive.execution.engine=spark,the execution plan generated one map join and one common join ,this is to say "(select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip)" go mapjoin ,and its result "tmp" has only 10 item, BUT "tmp join email" cannot go mapjoin...... and I DEBUG the code,:
in hive-on-spark: (1)(select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip) ----------------->>> MapWork.getMapredLocalWork() is OK,there is one MapRedLocalWork Object (2) the result of the previous stage named ‘tmp’ join email, MapWork.getMapredLocalWork() is null. Why hive on spark can not go mapjoin in this case, thankyou...