The first stage for 1st query is to build a hash table for map join. It
took 7s to finish. Why do you think it's slow? Of course, it seemed you had
many small files, since there were 100 mappers, so each file would be very
small. This is not good for performance. Also consider using other data
form
*Thanks for you warning.*
*The first query is mapjoin and second query is reducejoin.The data format
is all textInputFormat.*
*I'll go to learn more about mapjoin of **hive on spark** anyway,But
why** stage1
of first query in attachment is so slowly?*
*Explain first query:*
hive (u_wsd)> explai
Can you also attach explain query result? What's your data format?
--Xuefu
On Thu, Dec 3, 2015 at 12:09 AM, Jone Zhang wrote:
> Hive1.2.1 on Spark1.4.1
>
> *The first query is:*
> set mapred.reduce.tasks=100;
> use u_wsd;
> insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151
Hive1.2.1 on Spark1.4.1
*The first query is:*
set mapred.reduce.tasks=100;
use u_wsd;
insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151202)
select t1.uin,t1.clientip from
(select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151202)
t1
left outer join (select uin,