I'm really not convinced that there's no skew in your data. Look at the counters from the Hadoop TaskTracker pages, and thoroughly check that the numbers of reducer input records / groups and output records are all similar.
Phil. On 18 October 2012 09:56, Saurabh Mishra <saurabhmishra.i...@outlook.com> wrote: > any views on the problem > > ________________________________ > From: saurabhmishra.i...@outlook.com > To: user@hive.apache.org; navis....@nexr.com > Subject: RE: Hive Query Unable to distribute load evenly in reducers > Date: Tue, 16 Oct 2012 11:23:29 +0530 > > > by using mapjoin if you are implying setting > set hive.auto.convert.join=true; > then this configuration i am already using, but to no avail...:( > > ________________________________ > Date: Tue, 16 Oct 2012 14:17:47 +0900 > Subject: Re: Hive Query Unable to distribute load evenly in reducers > From: navis....@nexr.com > To: user@hive.apache.org > > How about using MapJoin? > > 2012/10/16 Saurabh Mishra <saurabhmishra.i...@outlook.com> > > no there is apparently no heavy skewing. also another stats i wanted to > point was, following is approximate table contents in this 4 table join > query : > tableA : 170 million (actual number, + i am also exploding these records, so > the number could be much much higher) > tableB:15 > tableC:45 > tableD:45 > tableE : 45 > tableF : 14000 > > Also i cannot put any filter condition on tableA ,situation does not permit > so. :( > Kindly suggest, some alternative solution or some hive configuration to > better load distribute in the reducers > >> Date: Mon, 15 Oct 2012 16:29:56 +0100 > >> Subject: Re: Hive Query Unable to distribute load evenly in reducers >> From: philip.j.trom...@gmail.com >> To: user@hive.apache.org > >> >> Is your data heavily skewed towards certain values of a.x etc? >> >> On 15 October 2012 15:23, Saurabh Mishra <saurabhmishra.i...@outlook.com> >> wrote: >> > The queries are simple joins, something on the lines of >> > select a, b, c, count(D) from tableA join tableB on a.x=b.y join.... >> > group >> > by a, b,c; >> > >> > >> >> From: liy...@gmail.com >> >> Date: Mon, 15 Oct 2012 21:10:39 +0800 >> >> Subject: Re: Hive Query Unable to distribute load evenly in reducers >> >> To: user@hive.apache.org >> > >> >> >> >> And your queries were? >> >> >> >> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra >> >> <saurabhmishra.i...@outlook.com> wrote: >> >> > Hi, >> >> > I am firing some hive queries joining tables containing upto >> >> > 30millions >> >> > records each. Since the load on the reducers is very significant in >> >> > these >> >> > cases, i specifically set the following parameters before executing >> >> > the >> >> > queries : >> >> > >> >> > set mapred.reduce.tasks=100; >> >> > set hive.exec.reducers.bytes.per.reducer=500000000; >> >> > set hive.optimize.cp=true; >> >> > >> >> > The number of reducer the job spouts in now 160, but despite the high >> >> > number >> >> > most of the load remains upon 1 or 2 reducers. Hence in the final >> >> > statistics, 158 reducers go completed with 2-3 minutes of start and 2 >> >> > reducers took 2 hrs to run. >> >> > Is there any way to overcome this load distribution disparity. >> >> > Any help in this regards will be highly appreciated. >> >> > >> >> > Sincerely >> >> > Saurabh Mishra > >