How about using MapJoin?

2012/10/16 Saurabh Mishra <saurabhmishra.i...@outlook.com>

> no there is apparently no heavy skewing. also another stats i wanted to
> point was, following is approximate table contents in this 4 table join
> query :
> tableA : 170 million (actual number, + i am also exploding these records,
> so the number could be much much higher)
> tableB:15
> tableC:45
> tableD:45
> tableE : 45
> tableF  : 14000
>
> Also i cannot put any filter condition on tableA ,situation does not
> permit so. :(
> Kindly suggest, some alternative solution or some hive configuration to
> better load distribute in the reducers
>
> > Date: Mon, 15 Oct 2012 16:29:56 +0100
>
> > Subject: Re: Hive Query Unable to distribute load evenly in reducers
> > From: philip.j.trom...@gmail.com
> > To: user@hive.apache.org
>
> >
> > Is your data heavily skewed towards certain values of a.x etc?
> >
> > On 15 October 2012 15:23, Saurabh Mishra <saurabhmishra.i...@outlook.com>
> wrote:
> > > The queries are simple joins, something on the lines of
> > > select a, b, c, count(D) from tableA join tableB on a.x=b.y join....
> group
> > > by a, b,c;
> > >
> > >
> > >> From: liy...@gmail.com
> > >> Date: Mon, 15 Oct 2012 21:10:39 +0800
> > >> Subject: Re: Hive Query Unable to distribute load evenly in reducers
> > >> To: user@hive.apache.org
> > >
> > >>
> > >> And your queries were?
> > >>
> > >> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
> > >> <saurabhmishra.i...@outlook.com> wrote:
> > >> > Hi,
> > >> > I am firing some hive queries joining tables containing upto
> 30millions
> > >> > records each. Since the load on the reducers is very significant in
> > >> > these
> > >> > cases, i specifically set the following parameters before executing
> the
> > >> > queries :
> > >> >
> > >> > set mapred.reduce.tasks=100;
> > >> > set hive.exec.reducers.bytes.per.reducer=500000000;
> > >> > set hive.optimize.cp=true;
> > >> >
> > >> > The number of reducer the job spouts in now 160, but despite the
> high
> > >> > number
> > >> > most of the load remains upon 1 or 2 reducers. Hence in the final
> > >> > statistics, 158 reducers go completed with 2-3 minutes of start and
> 2
> > >> > reducers took 2 hrs to run.
> > >> > Is there any way to overcome this load distribution disparity.
> > >> > Any help in this regards will be highly appreciated.
> > >> >
> > >> > Sincerely
> > >> > Saurabh Mishra
>

Reply via email to