RE: Hive Query Unable to distribute load evenly in reducers

Saurabh Mishra Mon, 15 Oct 2012 07:23:48 -0700

The queries are simple joins, something on the lines of 
select a, b, c, count(D) from tableA join tableB on a.x=b.y join.... group by 
a, b,c;



> From: liy...@gmail.com
> Date: Mon, 15 Oct 2012 21:10:39 +0800
> Subject: Re: Hive Query Unable to distribute load evenly in reducers
> To: user@hive.apache.org
> 
> And your queries were?
> 
> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
> <saurabhmishra.i...@outlook.com> wrote:
> > Hi,
> > I am firing some hive queries joining tables containing upto 30millions
> > records each. Since the load on the reducers is very significant in these
> > cases, i specifically set the following parameters before executing the
> > queries :
> >
> > set mapred.reduce.tasks=100;
> > set hive.exec.reducers.bytes.per.reducer=500000000;
> > set hive.optimize.cp=true;
> >
> > The number of reducer the job spouts in now 160, but despite the high number
> > most of the load remains upon 1 or 2 reducers. Hence in the final
> > statistics, 158 reducers go completed with 2-3 minutes of start and 2
> > reducers took 2 hrs to run.
> > Is there any way to overcome this load distribution disparity.
> > Any help in this regards will be highly appreciated.
> >
> > Sincerely
> > Saurabh Mishra

RE: Hive Query Unable to distribute load evenly in reducers

Reply via email to