Is your data heavily skewed towards certain values of a.x etc?
On 15 October 2012 15:23, Saurabh Mishra <saurabhmishra.i...@outlook.com> wrote: > The queries are simple joins, something on the lines of > select a, b, c, count(D) from tableA join tableB on a.x=b.y join.... group > by a, b,c; > > >> From: liy...@gmail.com >> Date: Mon, 15 Oct 2012 21:10:39 +0800 >> Subject: Re: Hive Query Unable to distribute load evenly in reducers >> To: user@hive.apache.org > >> >> And your queries were? >> >> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra >> <saurabhmishra.i...@outlook.com> wrote: >> > Hi, >> > I am firing some hive queries joining tables containing upto 30millions >> > records each. Since the load on the reducers is very significant in >> > these >> > cases, i specifically set the following parameters before executing the >> > queries : >> > >> > set mapred.reduce.tasks=100; >> > set hive.exec.reducers.bytes.per.reducer=500000000; >> > set hive.optimize.cp=true; >> > >> > The number of reducer the job spouts in now 160, but despite the high >> > number >> > most of the load remains upon 1 or 2 reducers. Hence in the final >> > statistics, 158 reducers go completed with 2-3 minutes of start and 2 >> > reducers took 2 hrs to run. >> > Is there any way to overcome this load distribution disparity. >> > Any help in this regards will be highly appreciated. >> > >> > Sincerely >> > Saurabh Mishra