[ https://issues.apache.org/jira/browse/HIVE-19041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463235#comment-16463235 ]
Misha Dmitriev commented on HIVE-19041: --------------------------------------- Yes, all interned strings are kept in the JVM internal equivalent of a concurrent WeakHashMap. Since it's highly specialized, it's very fast, and has no extra overhead when more strings are added to it (because it's quite large and preallocated, so actually every running JVM already bears this memory overhead of a few MB). If you are really interested, check this article: [http://java-performance.info/string-intern-in-java-6-7-8/] Basically, the only thing that you may be concerned with when using String.intern(), is the CPU overhead. But in my experience, unless interning is used, mistakingly, for strings that are very short-lived anyway, the impact of reduced GC outweighs the impact of of extra CPU cycles consumed by the intern() call. > Thrift deserialization of Partition objects should intern fields > ---------------------------------------------------------------- > > Key: HIVE-19041 > URL: https://issues.apache.org/jira/browse/HIVE-19041 > Project: Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: 3.0.0, 2.3.2 > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Priority: Major > Attachments: HIVE-19041.01.patch > > > When a client is creating large number of partitions, the thrift objects are > deserialized into Partition objects. The read method of these objects does > not intern the inputformat, location, outputformat which cause large number > of duplicate Strings in the HMS memory. We should intern these objects while > deserialization to reduce memory pressure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)