Why there are two different stages on the same query when i use hive on spark.

Jone Zhang Thu, 03 Dec 2015 00:10:32 -0800

Hive1.2.1 on Spark1.4.1

*The first query is:*
set mapred.reduce.tasks=100;
use u_wsd;
insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151202)
select t1.uin,t1.clientip from
(select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151202)
t1
left outer join (select uin,clientip from t_sd_ucm_cominfo_FinalResult
where ds=20151201) t2
on t1.uin=t2.uin
where t2.clientip is NULL;


*The second query is:*
set mapred.reduce.tasks=100;
use u_wsd;
insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151201)
select t1.uin,t1.clientip from
(select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151201)
t1
left outer join (select uin,clientip from t_sd_ucm_cominfo_FinalResult
where ds=20151130) t2
on t1.uin=t2.uin
where t2.clientip is NULL;

*The attachment show the two query's stages.*
*Here is the partition info*
104.3 M
 /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151202
110.0 M
 /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151201
112.6 M
 /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151130



*Why there are two different stages?*
*The stage1 in first query is very slowly.*

*Thanks.*
*Best wishes.*

Why there are two different stages on the same query when i use hive on spark.

Reply via email to