Hive Query Unable to distribute load evenly in reducers

Saurabh Mishra Mon, 15 Oct 2012 05:10:38 -0700

Hi,
I am firing some hive queries joining tables containing upto 30millions records 
each. Since the load on the reducers is very significant in these cases, i 
specifically set the following parameters before executing the queries :


set mapred.reduce.tasks=100;
set hive.exec.reducers.bytes.per.reducer=500000000;
set hive.optimize.cp=true;

The number of reducer the job spouts in now 160, but despite the high number 
most of the load remains upon 1 or 2 reducers. Hence in the final statistics, 
158 reducers go completed with 2-3 minutes of start and 2 reducers took 2 hrs 
to run.
Is there any way to overcome this load distribution disparity.
Any help in this regards will be highly appreciated.

Sincerely
Saurabh Mishra

Hive Query Unable to distribute load evenly in reducers

Reply via email to