I said it wrong: what really bothers me is not 500MB of RAM usage - it's
that mapper starting as 70-200Mb happy chimp becomes 500MB-600MB
bad-smelling gorilla. And that's on a simplest query! As far as I
understand Hive source code UDF length and UDAF max are super careful with
memory allocations. Same with get_json_object. And it's Java, it has modest
gc capabilites.

The question is: Is increasing RAM consumption an unavoidable feature of
Hive? Or I somehow has fouled up Java or Hive configuration? Not-Hive
Hadoop jobs work fine using constant amount of memory.

Thanks for you support.

Actually I have total of about 180 mappers. I meant 7 mappers per node.

2012/3/20 Bejoy Ks <bejoy...@yahoo.com>

> Hi Alex
>       In good clusters you have the child task JVM size as 1.5 or  2GB
> (or at least 1G). IMHO, 500MB for a task is a pretty normal
> memory consumption.
> Now for 50G of data you are having just 7 mappers, need to increase the
> number of mappers for better parallelism.
>
> Regards
> Bejoy
>
>   ------------------------------
> *From:* Alexander Ershov <vohs...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Tuesday, March 20, 2012 4:13 PM
> *Subject:* HIVE mappers eat a lot of RAM
>
> Hiya,
>
> I'm using HIVE 0.7.1 with
> 1) moderate 50GB table, let's call it `temp_view`
> 2) query: select max(length(get_json_object(json, '$.user_id'))) from
> temp_view. From my point of view this query is a total joke, nothing
> serious.
>
> Query runs just fine, everyone's happy.
>
> But I have massive memory consumption at the map phase: 7 active mappers
> eating 500 Mb of RAM each.
>
> This is a really bad stuff, it means real mappers on real queries will
> throw OutOfMemory exception (they do throw it actually).
>
> Anyone has any ideas of what I'm doing wrong? Cause I have zero.
>
>
>

Reply via email to