The queries are simple joins, something on the lines of select a, b, c, count(D) from tableA join tableB on a.x=b.y join.... group by a, b,c;
> From: liy...@gmail.com > Date: Mon, 15 Oct 2012 21:10:39 +0800 > Subject: Re: Hive Query Unable to distribute load evenly in reducers > To: user@hive.apache.org > > And your queries were? > > On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra > <saurabhmishra.i...@outlook.com> wrote: > > Hi, > > I am firing some hive queries joining tables containing upto 30millions > > records each. Since the load on the reducers is very significant in these > > cases, i specifically set the following parameters before executing the > > queries : > > > > set mapred.reduce.tasks=100; > > set hive.exec.reducers.bytes.per.reducer=500000000; > > set hive.optimize.cp=true; > > > > The number of reducer the job spouts in now 160, but despite the high number > > most of the load remains upon 1 or 2 reducers. Hence in the final > > statistics, 158 reducers go completed with 2-3 minutes of start and 2 > > reducers took 2 hrs to run. > > Is there any way to overcome this load distribution disparity. > > Any help in this regards will be highly appreciated. > > > > Sincerely > > Saurabh Mishra