----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57353/#review172668 -----------------------------------------------------------
Misha, whers is CopyOnFirstWriteProperties used? The patch looks pretty good, but I don't see where CopyOnFirstWriteProperties is instatiated. common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java Lines 314 (patched) <https://reviews.apache.org/r/57353/#comment245727> If Interners.newWeakInterner() returns a thread-safe interner, why do we have to lock the INTERNER only when updating it? - Sergio Pena On March 7, 2017, 1:22 a.m., Misha Dmitriev wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57353/ > ----------------------------------------------------------- > > (Updated March 7, 2017, 1:22 a.m.) > > > Review request for hive, Chaozhong Yang, Alan Gates, Rui Li, Prasanth_J, > Sergio Pena, Sahil Takiar, Vihang Karajgaonkar, and Xuefu Zhang. > > > Bugs: HIVE-16079 > https://issues.apache.org/jira/browse/HIVE-16079 > > > Repository: hive-git > > > Description > ------- > > When multiple concurrent Hive queries run, a separate copy of > org.apache.hadoop.hive.ql.metadata.Partition and > ql.plan.PartitionDesc is created for each table partition > per each query instance. So when in my benchmark explained in > HIVE-16079 we have 2000 partitions and 50 concurrent queries running > over them, we end up, in the worst case, with 2000*50=100,000 instances > of Partition and PartitionDesc in memory. These objects themselves > collectively take just ~2% of memory. However, other data structures > that each of them reference, take a lot more. In particular, Properties > objects take more than 20% of memory. When we have 50 concurrent > read-only queries, there are 50 identical copies of Properties per > each partition. That's a huge waste of memory. > > This change introduces a new class that extends Properties, called > CopyOnFirstWriteProperties. It utilizes a unique interned copy of > Properties whenever possible. However, when one of the methods that > modify properties is called, a copy is created. When this class is > used, memory consumption by Properties falls from 20% to 5..6%. > > > Diffs > ----- > > > common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java > 247d5890ea8131404b9543d22876ca4c052578e0 > ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java > d05c1c68fdb7296c0346d73967071da1ebe7bb72 > > > Diff: https://reviews.apache.org/r/57353/diff/1/ > > > Testing > ------- > > > Thanks, > > Misha Dmitriev > >