[ https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528876#comment-16528876 ]
Misha Dmitriev commented on HIVE-19937: --------------------------------------- [~stakiar] regarding the behavior of {{CopyOnFirstWriteProperties}} - such fine-grain behavior would be easy to implement. It will require changing the implementation of this class so that it has pointers to two hashtables: one for properties that are specific/unique for the given instance of {{COFWP}} and another table with properties that are common/default for all instances of {{COFWP}}. Each get() call should first check the first (specific) hashtable and then the second (default) hashtable, and each put() call should work only with the first hashtable. This would make sense in a situation when there is a sufficiently big number of common properties, but every/almost every table also has some specific properties. In contrast, the current {{CopyOnFirstWriteProperties}} works best when most tables are exactly the same and only a few are different. Well, after writing all this I realize that the proposed changed implementation of {{COFWP}} would probably be better in all scenarios. But before deciding on anything, we definitely should measure where the memory goes in realistic scenarios. Regarding interning only values in {{PartitionDesc#internProperties}} : yes, I think this was intentional - I carefully analyzed heap dumps before making this change, so if it was worth interning the keys, I would have done that too. Most probably when these tables are created, the Strings for keys already come from some source where they are already interned. > Intern JobConf objects in Spark tasks > ------------------------------------- > > Key: HIVE-19937 > URL: https://issues.apache.org/jira/browse/HIVE-19937 > Project: Hive > Issue Type: Improvement > Components: Spark > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Attachments: HIVE-19937.1.patch > > > When fixing HIVE-16395, we decided that each new Spark task should clone the > {{JobConf}} object to prevent any {{ConcurrentModificationException}} from > being thrown. However, setting this variable comes at a cost of storing a > duplicate {{JobConf}} object for each Spark task. These objects can take up a > significant amount of memory, we should intern them so that Spark tasks > running in the same JVM don't store duplicate copies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)