[ https://issues.apache.org/jira/browse/HIVE-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502638#comment-16502638 ]
Hive QA commented on HIVE-19668: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12926217/HIVE-19668.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 14466 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_view_delete] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_multiinsert] (batchId=87) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_unqual_corr_expr] (batchId=8) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multiinsert] (batchId=145) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11536/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11536/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11536/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12926217 - PreCommit-HIVE-Build > Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and > duplicate strings > ---------------------------------------------------------------------------------------------- > > Key: HIVE-19668 > URL: https://issues.apache.org/jira/browse/HIVE-19668 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Affects Versions: 3.0.0 > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Priority: Major > Attachments: HIVE-19668.01.patch, image-2018-05-22-17-41-39-572.png > > > I've recently analyzed a HS2 heap dump, obtained when there was a huge memory > spike during compilation of some big query. The analysis was done with jxray > ([www.jxray.com).|http://www.jxray.com)./] It turns out that more than 90% of > the 20G heap was used by data structures associated with query parsing > ({{org.apache.hadoop.hive.ql.parse.QBExpr}}). There are probably multiple > opportunities for optimizations here. One of them is to stop the code from > creating duplicate instances of {{org.antlr.runtime.CommonToken}} class. See > a sample of these objects in the attached image: > !image-2018-05-22-17-41-39-572.png|width=879,height=399! > Looks like these particular {{CommonToken}} objects are constants, that don't > change once created. I see some code, e.g. in > {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}}, where such objects are > apparently repeatedly created with e.g. {{new > CommonToken(HiveParser.TOK_INSERT, "TOK_INSERT")}} If these 33 token kinds > are instead created once and reused, we will save more than 1/10th of the > heap in this scenario. Plus, since these objects are small but very numerous, > getting rid of them will remove a gread deal of pressure from the GC. > Another source of waste are duplicate strings, that collectively waste 26.1% > of memory. Some of them come from CommonToken objects that have the same text > (i.e. for multiple CommonToken objects the contents of their 'text' Strings > are the same, but each has its own copy of that String). Other duplicate > strings come from other sources, that are easy enough to fix by adding > String.intern() calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)