Re: hive : question about reducers

Viral Bajaria Thu, 10 Feb 2011 16:56:16 -0800

there were 3 different queries which exhibited this behavior ... one was
over 30-days worth of data and 2 were over 7-days worth of data.


On Thu, Feb 10, 2011 at 3:49 PM, Jonathan Coveney <jcove...@gmail.com>wrote:

> How many days of data are you working on?
>
>
> Sent via BlackBerry
> ------------------------------
> *From: * Viral Bajaria <viral.baja...@gmail.com>
> *Date: *Thu, 10 Feb 2011 15:21:32 -0800
> *To: *<user@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: hive : question about reducers
>
> I don't have any explicit bucketing in my data. The data is partitioned by
> current_date (it has no hour information, so basically 24 hours of data).
>
> It's not a problem because eventually the job would complete (super-slow)
> but it would be nice to know the reason behind this behavior and how I could
> optimize it so that I can take full advantage of having multiple reducers
> running.
>
> -Viral
>
> On Thu, Feb 10, 2011 at 3:02 PM, Ajo Fod <ajo....@gmail.com> wrote:
>
>> I've had similar experiences ... usually with bucketing.
>>
>> Is this your experience too?
>>
>> -Ajo
>>
>>
>> On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria 
>> <viral.baja...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I
>>> am allowing HIVE to figure out the # of reducers that it would need from the
>>> data.
>>>
>>> When I run a query, it determines that it will need 4 reducers but when I
>>> look at the MAPRED logs, I see that all the work is done by a single reducer
>>> while the other 3 reducers forward 0 rows. Is this just bad planning on HIVE
>>> side or am I missing something.
>>>
>>> Thanks,
>>> Viral
>>>
>>
>>
>

Re: hive : question about reducers

Reply via email to