[ https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928202#comment-15928202 ]
Sergio Peña commented on HIVE-16166: ------------------------------------ [~mi...@cloudera.com] HiveQA already finished, but it could not post the results on the JIRA. Here are the results: https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/4164/ Test Result (10 failures / +10) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part_varchar] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[special_character_in_tabnames_1] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_join_part_col_char] > HS2 may still waste up to 15% of memory on duplicate strings > ------------------------------------------------------------ > > Key: HIVE-16166 > URL: https://issues.apache.org/jira/browse/HIVE-16166 > Project: Hive > Issue Type: Improvement > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch > > > A heap dump obtained from one of our users shows that 15% of memory is wasted > on duplicate strings, despite the recent optimizations that I made. The > problematic strings just come from different sources this time. See the > excerpt from the jxray (www.jxray.com) analysis attached. > Adding String.intern() calls in the appropriate places reduces the overhead > of duplicate strings with this workload to ~6%. The remaining duplicates come > mostly from JDK internal and MapReduce data structures, and thus are more > difficult to fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346)