Re: Review Request 57353: Intern Properties objects referenced from PartitionDesc to reduce memory pressure.

Misha Dmitriev Fri, 21 Apr 2017 13:16:10 -0700


> On April 21, 2017, 4:35 p.m., Sergio Pena wrote:
> > Misha, whers is CopyOnFirstWriteProperties used? The patch looks pretty 
> > good, but I don't see where CopyOnFirstWriteProperties is instatiated.
> 
> Misha Dmitriev wrote:
>     It's not instantiated directly. Rather, see the 
> serialization/deserialization code in SerializationUtilities.java, where this 
> class is indirectly instantiated. My understanding is that this is how 
> Partitions and their child data structures are created, by transferring data 
> from HMS.
> 
> Sergio Pena wrote:
>     I still not found how this happens. Could you describe how you understand 
> this happens? Maybe I can follow you better than the code.


Right, now I understand what you mean. I made a mistake when making some final 
edits of this code. A new CopyOnFirstWriteProperties instance should be created 
in the setProperties() method of PartitionDesc. I'll make a fix and post a new 
patch.


- Misha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57353/#review172668
-----------------------------------------------------------


On March 7, 2017, 1:22 a.m., Misha Dmitriev wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57353/
> -----------------------------------------------------------
> 
> (Updated March 7, 2017, 1:22 a.m.)
> 
> 
> Review request for hive, Chaozhong Yang, Alan Gates, Rui Li, Prasanth_J, 
> Sergio Pena, Sahil Takiar, Vihang Karajgaonkar, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-16079
>     https://issues.apache.org/jira/browse/HIVE-16079
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> When multiple concurrent Hive queries run, a separate copy of
> org.apache.hadoop.hive.ql.metadata.Partition and
> ql.plan.PartitionDesc is created for each table partition
> per each query instance. So when in my benchmark explained in
> HIVE-16079 we have 2000 partitions and 50 concurrent queries running
> over them, we end up, in the worst case, with 2000*50=100,000 instances
> of Partition and PartitionDesc in memory. These objects themselves
> collectively take just ~2% of memory. However, other data structures
> that each of them reference, take a lot more. In particular, Properties
> objects take more than 20% of memory. When we have 50 concurrent
> read-only queries, there are 50 identical copies of Properties per
> each partition. That's a huge waste of memory.
> 
> This change introduces a new class that extends Properties, called
> CopyOnFirstWriteProperties. It utilizes a unique interned copy of
> Properties whenever possible. However, when one of the methods that
> modify properties is called, a copy is created. When this class is
> used, memory consumption by Properties falls from 20% to 5..6%.
> 
> 
> Diffs
> -----
> 
>   
> common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 247d5890ea8131404b9543d22876ca4c052578e0 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 
> d05c1c68fdb7296c0346d73967071da1ebe7bb72 
> 
> 
> Diff: https://reviews.apache.org/r/57353/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Misha Dmitriev
> 
>

Re: Review Request 57353: Intern Properties objects referenced from PartitionDesc to reduce memory pressure.

Reply via email to