Hive1.2.1 on Spark1.4.1 *The first query is:* set mapred.reduce.tasks=100; use u_wsd; insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151202) select t1.uin,t1.clientip from (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151202) t1 left outer join (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151201) t2 on t1.uin=t2.uin where t2.clientip is NULL;
*The second query is:* set mapred.reduce.tasks=100; use u_wsd; insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151201) select t1.uin,t1.clientip from (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151201) t1 left outer join (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151130) t2 on t1.uin=t2.uin where t2.clientip is NULL; *The attachment show the two query's stages.* *Here is the partition info* 104.3 M /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151202 110.0 M /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151201 112.6 M /user/hive/warehouse/u_wsd.db/t_sd_ucm_cominfo_finalresult/ds=20151130 *Why there are two different stages?* *The stage1 in first query is very slowly.* *Thanks.* *Best wishes.*