[ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Misha Dmitriev updated HIVE-17237: ---------------------------------- Status: Patch Available (was: Open) > HMS wastes 26.4% of memory due to dup strings in > metastore.api.Partition.parameters > ----------------------------------------------------------------------------------- > > Key: HIVE-17237 > URL: https://issues.apache.org/jira/browse/HIVE-17237 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Attachments: HIVE-17237.01.patch > > > I've analyzed a heap dump from a production Hive installation using jxray > (www.jxray.com) It turns out that there are a lot of duplicate strings in > memory, that waste 26.4% of the heap. Most of them come from HashMaps > referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. > Below is the relevant section of the jxray report. > Looking at Partition.java, I see that in the past somebody has already added > code to intern keys and values in the parameters table when it's first set > up. However, when more key-value pairs are added, they are not interned, and > that probably explains the reason for all these duplicate strings. Also when > a Partition instance is deserialized, no interning of parameters is currently > done. > {code} > 6. DUPLICATE STRINGS > Total strings: 3,273,557 Unique strings: 460,390 Duplicate values: 110,232 > Overhead: 3,220,458K (26.4%) > .... > =================================================== > 7. REFERENCE CHAINS FOR DUPLICATE STRINGS > 2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing > arrays: > 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", > 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]" > ... and 419200 more strings, of which 36376 are unique > Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", > 28 of "2", 21 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > 463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing > arrays: > 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of > "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980" > ... and 84009 more strings, of which 34065 are unique > Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 > of "2", 3 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68] > 233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays: > 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 > of "10" ... and 44568 more strings, of which 27285 are unique > Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of > "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- Java Local (j.u.ArrayList) > [@4f4cfbd10,@536122408,@726616778] > ... > 52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays: > <-- {j.u.HashMap}.keys <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)