[ https://issues.apache.org/jira/browse/HIVE-16489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Misha Dmitriev resolved HIVE-16489. ----------------------------------- Resolution: Duplicate > HMS wastes 26.4% of memory due to dup strings in > metastore.api.Partition.parameters > ----------------------------------------------------------------------------------- > > Key: HIVE-16489 > URL: https://issues.apache.org/jira/browse/HIVE-16489 > Project: Hive > Issue Type: Improvement > Components: Hive > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > > I've just analyzed an HMS heap dump. It turns out that it contains a lot of > duplicate strings, that waste 26.4% of the heap. Most of them come from > HashMaps referenced by > org.apache.hadoop.hive.metastore.api.Partition.parameters. Below is the > relevant section of the jxray (www.jxray.com) report. Looking at > Partition.java, I see that in the past somebody has already added code to > intern keys and values in the parameters table when it's first set up. > However, looks like when more key-value pairs are added, they are not > interned, and that probably explains the reason for all these duplicate > strings. > {code} > 6. DUPLICATE STRINGS > Total strings: 3,273,557 Unique strings: 460,390 Duplicate values: 110,232 > Overhead: 3,220,458K (26.4%) > Top duplicate strings: > Ovhd Num char[]s Num objs Value > 46,088K (0.4%) 5871 5871 > "HBa4rRAAGx2MEmludGVyZXN0cmF0ZXNwcmVhZBgM/wD/AP8AXAAAAqEAERYBFQAXAAAAAAAAIEAWuK0QAA1s > ...[length 4000]" > 46,088K (0.4%) 5871 5871 > "BQcHBQUGBQgGBQcHCAUGCAkECQcFBQwGBgoJBQYHBQUFBQYKBQgIBgUJEgYFDAYJBgcGBAcLBQYGCAgGCQYG > ...[length 4000]" > ... > =================================================== > 7. REFERENCE CHAINS FOR DUPLICATE STRINGS > 2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing > arrays: > 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", > 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]" > ... and 419200 more strings, of which 36376 are unique > Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", > 28 of "2", 21 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success > <-- Java Local > (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) > [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots > 463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing > arrays: > 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of > "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980" > ... and 84009 more strings, of which 34065 are unique > Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 > of "2", 3 of "0" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68] > 233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays: > 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 > of "10", 623 of > "CQUJBQcFCAcGBwUFCgUIDAgEBwgFBQcHBwgGBwYEBQoLCggFCAYHBgcIBwkIDgcG ...[length > 4000]", 623 of > "BQcHBQUGBQgGBQcHCAUGCAkECQcFBQwGBgoJBQYHBQUFBQYKBQgIBgUJEgYFDAYJ ...[length > 4000]", 623 of > "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length > 3560]", 623 of > "AAMAAAEAAAAAAAEAAAAAAQABAAEHAwAKAgAEAwAAAAAAAgAEAAAAAAMAAAADAAAA ...[length > 4000]" > ... and 44568 more strings, of which 27285 are unique > Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of > "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3" > <-- {j.u.HashMap}.values <-- > org.apache.hadoop.hive.metastore.api.Partition.parameters <-- > {j.u.ArrayList} <-- Java Local (j.u.ArrayList) > [@4f4cfbd10,@536122408,@726616778] > ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)