> On April 21, 2017, 4:35 p.m., Sergio Pena wrote: > > Misha, whers is CopyOnFirstWriteProperties used? The patch looks pretty > > good, but I don't see where CopyOnFirstWriteProperties is instatiated. > > Misha Dmitriev wrote: > It's not instantiated directly. Rather, see the > serialization/deserialization code in SerializationUtilities.java, where this > class is indirectly instantiated. My understanding is that this is how > Partitions and their child data structures are created, by transferring data > from HMS. > > Sergio Pena wrote: > I still not found how this happens. Could you describe how you understand > this happens? Maybe I can follow you better than the code.
Right, now I understand what you mean. I made a mistake when making some final edits of this code. A new CopyOnFirstWriteProperties instance should be created in the setProperties() method of PartitionDesc. I'll make a fix and post a new patch. - Misha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57353/#review172668 ----------------------------------------------------------- On March 7, 2017, 1:22 a.m., Misha Dmitriev wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57353/ > ----------------------------------------------------------- > > (Updated March 7, 2017, 1:22 a.m.) > > > Review request for hive, Chaozhong Yang, Alan Gates, Rui Li, Prasanth_J, > Sergio Pena, Sahil Takiar, Vihang Karajgaonkar, and Xuefu Zhang. > > > Bugs: HIVE-16079 > https://issues.apache.org/jira/browse/HIVE-16079 > > > Repository: hive-git > > > Description > ------- > > When multiple concurrent Hive queries run, a separate copy of > org.apache.hadoop.hive.ql.metadata.Partition and > ql.plan.PartitionDesc is created for each table partition > per each query instance. So when in my benchmark explained in > HIVE-16079 we have 2000 partitions and 50 concurrent queries running > over them, we end up, in the worst case, with 2000*50=100,000 instances > of Partition and PartitionDesc in memory. These objects themselves > collectively take just ~2% of memory. However, other data structures > that each of them reference, take a lot more. In particular, Properties > objects take more than 20% of memory. When we have 50 concurrent > read-only queries, there are 50 identical copies of Properties per > each partition. That's a huge waste of memory. > > This change introduces a new class that extends Properties, called > CopyOnFirstWriteProperties. It utilizes a unique interned copy of > Properties whenever possible. However, when one of the methods that > modify properties is called, a copy is created. When this class is > used, memory consumption by Properties falls from 20% to 5..6%. > > > Diffs > ----- > > > common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java > 247d5890ea8131404b9543d22876ca4c052578e0 > ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java > d05c1c68fdb7296c0346d73967071da1ebe7bb72 > > > Diff: https://reviews.apache.org/r/57353/diff/1/ > > > Testing > ------- > > > Thanks, > > Misha Dmitriev > >