I said it wrong: what really bothers me is not 500MB of RAM usage - it's that mapper starting as 70-200Mb happy chimp becomes 500MB-600MB bad-smelling gorilla. And that's on a simplest query! As far as I understand Hive source code UDF length and UDAF max are super careful with memory allocations. Same with get_json_object. And it's Java, it has modest gc capabilites.
The question is: Is increasing RAM consumption an unavoidable feature of Hive? Or I somehow has fouled up Java or Hive configuration? Not-Hive Hadoop jobs work fine using constant amount of memory. Thanks for you support. Actually I have total of about 180 mappers. I meant 7 mappers per node. 2012/3/20 Bejoy Ks <bejoy...@yahoo.com> > Hi Alex > In good clusters you have the child task JVM size as 1.5 or 2GB > (or at least 1G). IMHO, 500MB for a task is a pretty normal > memory consumption. > Now for 50G of data you are having just 7 mappers, need to increase the > number of mappers for better parallelism. > > Regards > Bejoy > > ------------------------------ > *From:* Alexander Ershov <vohs...@gmail.com> > *To:* user@hive.apache.org > *Sent:* Tuesday, March 20, 2012 4:13 PM > *Subject:* HIVE mappers eat a lot of RAM > > Hiya, > > I'm using HIVE 0.7.1 with > 1) moderate 50GB table, let's call it `temp_view` > 2) query: select max(length(get_json_object(json, '$.user_id'))) from > temp_view. From my point of view this query is a total joke, nothing > serious. > > Query runs just fine, everyone's happy. > > But I have massive memory consumption at the map phase: 7 active mappers > eating 500 Mb of RAM each. > > This is a really bad stuff, it means real mappers on real queries will > throw OutOfMemory exception (they do throw it actually). > > Anyone has any ideas of what I'm doing wrong? Cause I have zero. > > >