there were 3 different queries which exhibited this behavior ... one was over 30-days worth of data and 2 were over 7-days worth of data.
On Thu, Feb 10, 2011 at 3:49 PM, Jonathan Coveney <jcove...@gmail.com>wrote: > How many days of data are you working on? > > > Sent via BlackBerry > ------------------------------ > *From: * Viral Bajaria <viral.baja...@gmail.com> > *Date: *Thu, 10 Feb 2011 15:21:32 -0800 > *To: *<user@hive.apache.org> > *ReplyTo: * user@hive.apache.org > *Subject: *Re: hive : question about reducers > > I don't have any explicit bucketing in my data. The data is partitioned by > current_date (it has no hour information, so basically 24 hours of data). > > It's not a problem because eventually the job would complete (super-slow) > but it would be nice to know the reason behind this behavior and how I could > optimize it so that I can take full advantage of having multiple reducers > running. > > -Viral > > On Thu, Feb 10, 2011 at 3:02 PM, Ajo Fod <ajo....@gmail.com> wrote: > >> I've had similar experiences ... usually with bucketing. >> >> Is this your experience too? >> >> -Ajo >> >> >> On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria >> <viral.baja...@gmail.com>wrote: >> >>> Hello, >>> >>> In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I >>> am allowing HIVE to figure out the # of reducers that it would need from the >>> data. >>> >>> When I run a query, it determines that it will need 4 reducers but when I >>> look at the MAPRED logs, I see that all the work is done by a single reducer >>> while the other 3 reducers forward 0 rows. Is this just bad planning on HIVE >>> side or am I missing something. >>> >>> Thanks, >>> Viral >>> >> >> >