[ https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692104#comment-16692104 ]
Hive QA commented on HIVE-20760: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12948737/HIVE-20760.7.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15551 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.hooks.TestHiveProtoLoggingHook.testRolloverFiles (batchId=319) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14989/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14989/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14989/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12948737 - PreCommit-HIVE-Build > Reducing memory overhead due to multiple HiveConfs > -------------------------------------------------- > > Key: HIVE-20760 > URL: https://issues.apache.org/jira/browse/HIVE-20760 > Project: Hive > Issue Type: Improvement > Components: Configuration > Reporter: Barnabas Maidics > Assignee: Barnabas Maidics > Priority: Major > Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, > HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, > HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.patch, > hiveconf_interned.html, hiveconf_original.html > > > The issue is that every Hive task has to load its own version of > {{HiveConf}}. When running with a large number of cores per executor (HoS), > there is a significant (~10%) amount of memory wasted due to this > duplication. > I looked into the problem and found a way to reduce the overhead caused by > the multiple HiveConf objects. > I've created an implementation of Properties, somewhat similar to > CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve > this problem, because it drops the interned Properties right after we add a > new property. > So my implementation looks like this: > * When we create a new HiveConf from an existing one (copy constructor), we > change the properties object stored by HiveConf to the new Properties > implementation (HiveConfProperties). We have 2 possible way to do this. > Either we change the visibility of the properties field in the ancestor class > (Configuration which comes from hadoop) to protected, or a simpler way is to > just change the type using reflection. > * HiveConfProperties instantly intern the given properties. After this, > every time we add a new property to HiveConf, we add it to an additional > Properties object. This way if we create multiple HiveConf with the same base > properties, they will use the same Properties object but each session/task > can add its own unique properties. > * Getting a property from HiveConfProperties would look like this: (I stored > the non-interned properties in super class) > String property=super.getProperty(key); > if (property == null) property= interned.getProperty(key); > return property; > Running some tests showed that the interning works (with 50 connections to > HiveServer2, heapdumps created after sessions are created for queries): > Overall memory: > original: 34,599K interned: 20,582K > Retained memory of HiveConfs: > original: 16,366K interned: 10,804K > I attach the JXray reports about the heapdumps. > What are your thoughts about this solution? -- This message was sent by Atlassian JIRA (v7.6.3#76005)