How many days of data are you working on? Sent via BlackBerry
-----Original Message----- From: Viral Bajaria <viral.baja...@gmail.com> Date: Thu, 10 Feb 2011 15:21:32 To: <user@hive.apache.org> Reply-To: user@hive.apache.org Subject: Re: hive : question about reducers I don't have any explicit bucketing in my data. The data is partitioned by current_date (it has no hour information, so basically 24 hours of data). It's not a problem because eventually the job would complete (super-slow) but it would be nice to know the reason behind this behavior and how I could optimize it so that I can take full advantage of having multiple reducers running. -Viral On Thu, Feb 10, 2011 at 3:02 PM, Ajo Fod <ajo....@gmail.com> wrote: > I've had similar experiences ... usually with bucketing. > > Is this your experience too? > > -Ajo > > > On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria <viral.baja...@gmail.com>wrote: > >> Hello, >> >> In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I >> am allowing HIVE to figure out the # of reducers that it would need from the >> data. >> >> When I run a query, it determines that it will need 4 reducers but when I >> look at the MAPRED logs, I see that all the work is done by a single reducer >> while the other 3 reducers forward 0 rows. Is this just bad planning on HIVE >> side or am I missing something. >> >> Thanks, >> Viral >> > >