[ https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Barnabas Maidics updated HIVE-20760: ------------------------------------ Attachment: HIVE-20760.12.patch Status: Patch Available (was: Open) > Reducing memory overhead due to multiple HiveConfs > -------------------------------------------------- > > Key: HIVE-20760 > URL: https://issues.apache.org/jira/browse/HIVE-20760 > Project: Hive > Issue Type: Improvement > Components: Configuration > Reporter: Barnabas Maidics > Assignee: Barnabas Maidics > Priority: Major > Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, > HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, > HIVE-20760.12.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, > HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, > HIVE-20760.9.patch, HIVE-20760.patch, hiveconf_interned.html, > hiveconf_original.html > > > The issue is that every Hive task has to load its own version of > {{HiveConf}}. When running with a large number of cores per executor (HoS), > there is a significant (~10%) amount of memory wasted due to this > duplication. > I looked into the problem and found a way to reduce the overhead caused by > the multiple HiveConf objects. > I've created an implementation of Properties, somewhat similar to > CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve > this problem, because it drops the interned Properties right after we add a > new property. > So my implementation looks like this: > * When we create a new HiveConf from an existing one (copy constructor), we > change the properties object stored by HiveConf to the new Properties > implementation (HiveConfProperties). We have 2 possible way to do this. > Either we change the visibility of the properties field in the ancestor class > (Configuration which comes from hadoop) to protected, or a simpler way is to > just change the type using reflection. > * HiveConfProperties instantly intern the given properties. After this, > every time we add a new property to HiveConf, we add it to an additional > Properties object. This way if we create multiple HiveConf with the same base > properties, they will use the same Properties object but each session/task > can add its own unique properties. > * Getting a property from HiveConfProperties would look like this: (I stored > the non-interned properties in super class) > String property=super.getProperty(key); > if (property == null) property= interned.getProperty(key); > return property; > Running some tests showed that the interning works (with 50 connections to > HiveServer2, heapdumps created after sessions are created for queries): > Overall memory: > original: 34,599K interned: 20,582K > Retained memory of HiveConfs: > original: 16,366K interned: 10,804K > I attach the JXray reports about the heapdumps. > What are your thoughts about this solution? -- This message was sent by Atlassian JIRA (v7.6.3#76005)