[ https://issues.apache.org/jira/browse/HIVE-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558917#comment-16558917 ]
Gopal V commented on HIVE-20153: -------------------------------- LGTM - +1 tests pending. This extra field is still taking up meaningful amounts of memory for the objects in the heap. >From JOL. {code} ***** 64-bit VM: ********************************************************** org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumEvaluator$SumAgg object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 16 (object header) N/A 16 1 boolean SumAgg.empty N/A 17 7 (alignment/padding gap) 24 8 java.lang.Object SumAgg.sum N/A 32 8 java.util.HashSet SumAgg.uniqueObjects N/A Instance size: 40 bytes Space losses: 7 bytes internal + 0 bytes external = 7 bytes total ... ***** 64-bit VM, compressed references enabled: *************************** org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumEvaluator$SumAgg object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 1 boolean SumAgg.empty N/A 13 3 (alignment/padding gap) 16 4 java.lang.Object SumAgg.sum N/A 20 4 java.util.HashSet SumAgg.uniqueObjects N/A Instance size: 24 bytes Space losses: 3 bytes internal + 0 bytes external = 3 bytes total {code} a PTF specific sub-class would remove that part & let me think of a way of having a SumAggEmpty class (the "which class is it" goes into the 12 byte obj header). > Count and Sum UDF consume more memory in Hive 2+ > ------------------------------------------------ > > Key: HIVE-20153 > URL: https://issues.apache.org/jira/browse/HIVE-20153 > Project: Hive > Issue Type: Bug > Components: UDF > Affects Versions: 2.3.2 > Reporter: Szehon Ho > Assignee: Aihua Xu > Priority: Major > Attachments: HIVE-20153.1.patch, Screen Shot 2018-07-12 at 6.41.28 > PM.png > > > While playing with Hive2, we noticed that queries with a lot of count() and > sum() aggregations run out of memory on Hadoop side where they worked before > in Hive1. > In many queries, we have to double the Mapper Memory settings (in our > particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it > makes it not so easy to upgrade to Hive 2. > Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' > in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window > functions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)