Hi,
I am firing some hive queries joining tables containing upto 30millions records 
each. Since the load on the reducers is very significant in these cases, i 
specifically set the following parameters before executing the queries : 

set mapred.reduce.tasks=100;
set hive.exec.reducers.bytes.per.reducer=500000000;
set hive.optimize.cp=true;

The number of reducer the job spouts in now 160, but despite the high number 
most of the load remains upon 1 or 2 reducers. Hence in the final statistics, 
158 reducers go completed with 2-3 minutes of start and 2 reducers took 2 hrs 
to run.
Is there any way to overcome this load distribution disparity.
Any help in this regards will be highly appreciated.

Sincerely
Saurabh Mishra
                                          

Reply via email to