I figured it out.

To help the future generations:
The problem was in property hive.groupby.mapaggr.checkinterval which
defaults to 100000. Since I was doing 'group by' query and each row was 4Kb
and each mapper got only 30000 rows, no mapper had an opportunity to do
whatever checkinterval option wants it to do. So I set
hive.groupby.mapaggr.checkinterval=1000.

Worked like a charm.

Thanks.

2012/3/20 Alexander Ershov <vohs...@gmail.com>

> I said it wrong: what really bothers me is not 500MB of RAM usage - it's
> that mapper starting as 70-200Mb happy chimp becomes 500MB-600MB
> bad-smelling gorilla. And that's on a simplest query! As far as I
> understand Hive source code UDF length and UDAF max are super careful with
> memory allocations. Same with get_json_object. And it's Java, it has modest
> gc capabilites.
>
> The question is: Is increasing RAM consumption an unavoidable feature of
> Hive? Or I somehow has fouled up Java or Hive configuration? Not-Hive
> Hadoop jobs work fine using constant amount of memory.
>
> Thanks for you support.
>
> Actually I have total of about 180 mappers. I meant 7 mappers per node.
>
>
> 2012/3/20 Bejoy Ks <bejoy...@yahoo.com>
>
>> Hi Alex
>>       In good clusters you have the child task JVM size as 1.5 or  2GB
>> (or at least 1G). IMHO, 500MB for a task is a pretty normal
>> memory consumption.
>> Now for 50G of data you are having just 7 mappers, need to increase the
>> number of mappers for better parallelism.
>>
>> Regards
>> Bejoy
>>
>>   ------------------------------
>> *From:* Alexander Ershov <vohs...@gmail.com>
>> *To:* user@hive.apache.org
>> *Sent:* Tuesday, March 20, 2012 4:13 PM
>> *Subject:* HIVE mappers eat a lot of RAM
>>
>> Hiya,
>>
>> I'm using HIVE 0.7.1 with
>> 1) moderate 50GB table, let's call it `temp_view`
>> 2) query: select max(length(get_json_object(json, '$.user_id'))) from
>> temp_view. From my point of view this query is a total joke, nothing
>> serious.
>>
>> Query runs just fine, everyone's happy.
>>
>> But I have massive memory consumption at the map phase: 7 active mappers
>> eating 500 Mb of RAM each.
>>
>> This is a really bad stuff, it means real mappers on real queries will
>> throw OutOfMemory exception (they do throw it actually).
>>
>> Anyone has any ideas of what I'm doing wrong? Cause I have zero.
>>
>>
>>
>

Reply via email to