hi guys

I am trying to optimize a hive join query, I have a join of two big tables. The 
join between them is taking too long, no matter how many reducers I set, there 
are always two reducers struggling to finish  in the end of  the job
The job not always ends, sometime it fails with memory problems

In the fast completed reducers I can see:
7688459 rows: used memory = 991337736

In the long running reducers:

43363436 rows: used memory = 1142368456


At first I thought  am dealing with  skew key, but I set the   
hive.optimize.skewjoin to true, and  it didn't change a thing, I played with  
hive.skewjoin.key also didn't change a thing

Any other ideas I can try?

I am using hive 0.10 of CDH4.2.1

the source tables are using customized   serdes


Thanks
Guy Doulberg
Team leader @ Perion

Reply via email to