Re: Why there are two different stages on the same query when i use hive on spark.

2015-12-04 Thread Xuefu Zhang
The first stage for 1st query is to build a hash table for map join. It took 7s to finish. Why do you think it's slow? Of course, it seemed you had many small files, since there were 100 mappers, so each file would be very small. This is not good for performance. Also consider using other data form

Re: Why there are two different stages on the same query when i use hive on spark.

2015-12-03 Thread Jone Zhang
*Thanks for you warning.* *The first query is mapjoin and second query is reducejoin.The data format is all textInputFormat.* *I'll go to learn more about mapjoin of **hive on spark** anyway,But why** stage1 of first query in attachment is so slowly?* *Explain first query:* hive (u_wsd)> explai

Re: Why there are two different stages on the same query when i use hive on spark.

2015-12-03 Thread Xuefu Zhang
Can you also attach explain query result? What's your data format? --Xuefu On Thu, Dec 3, 2015 at 12:09 AM, Jone Zhang wrote: > Hive1.2.1 on Spark1.4.1 > > *The first query is:* > set mapred.reduce.tasks=100; > use u_wsd; > insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151

Why there are two different stages on the same query when i use hive on spark.

2015-12-03 Thread Jone Zhang
Hive1.2.1 on Spark1.4.1 *The first query is:* set mapred.reduce.tasks=100; use u_wsd; insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151202) select t1.uin,t1.clientip from (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151202) t1 left outer join (select uin,