[ 
https://issues.apache.org/jira/browse/HIVE-19041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463235#comment-16463235
 ] 

Misha Dmitriev commented on HIVE-19041:
---------------------------------------

Yes, all interned strings are kept in the JVM internal equivalent of a 
concurrent WeakHashMap. Since it's highly specialized, it's very fast, and has 
no extra overhead when more strings are added to it (because it's quite large 
and preallocated, so actually every running JVM already bears this memory 
overhead of a few MB). If you are really interested, check this article: 
[http://java-performance.info/string-intern-in-java-6-7-8/] 

Basically, the only thing that you may be concerned with when using 
String.intern(), is the CPU overhead. But in my experience, unless interning is 
used, mistakingly, for strings that are very short-lived anyway, the impact of 
reduced GC outweighs the impact of of extra CPU cycles consumed by the intern() 
call.

> Thrift deserialization of Partition objects should intern fields
> ----------------------------------------------------------------
>
>                 Key: HIVE-19041
>                 URL: https://issues.apache.org/jira/browse/HIVE-19041
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-19041.01.patch
>
>
> When a client is creating large number of partitions, the thrift objects are 
> deserialized into Partition objects. The read method of these objects does 
> not intern the inputformat, location, outputformat which cause large number 
> of duplicate Strings in the HMS memory. We should intern these objects while 
> deserialization to reduce memory pressure. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to