> On Feb. 22, 2017, 3:30 a.m., Sahil Takiar wrote:
> > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java, line 
> > 29
> > <https://reviews.apache.org/r/56687/diff/1/?file=1633965#file1633965line29>
> >
> >     How will this class relate to the intern utils provided in 
> > `HiveStringUtils`?

The string interning methods in HiveStringUtils, that use a custom Google 
interner for strings, are now obsolete. This code was likely created in the old 
days of JDK6, when the standard String.intern() didn't scale well. But since 
JDK7, several improevements were made to String.intern(), and now it works very 
well. So there is no need for this custom code anymore.

Now, the code from my new class StringInternUtils code be moved to 
HiveStringUtils - it's not against logic. The main reason I don't want to do 
that is that HiveStringUtils.java is already huge. My new interning methods 
need some static fields and other intitialization. It's easy enough to 
understand and maintain when this stuff is all close together in its own class. 
But not so good if it gets spread across several places in HiveStringUtils.java.


> On Feb. 22, 2017, 3:30 a.m., Sahil Takiar wrote:
> > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java, line 
> > 87
> > <https://reviews.apache.org/r/56687/diff/1/?file=1633965#file1633965line87>
> >
> >     Add check to see if path is null

Done.


> On Feb. 22, 2017, 3:30 a.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java,
> >  line 394
> > <https://reviews.apache.org/r/56687/diff/1/?file=1633975#file1633975line394>
> >
> >     Did the changes to the skew join come up when running the `count()` 
> > queries? Or did you notice that this code could benefit from interning and 
> > decide to update it.

Most likely the former. There is so much code in Hive that it would be 
difficult to just browse through it and find all pieces that would potentially 
benefit from interning. So I think I've noticed that this source of Path 
objects is likely responsible for duplicate strings somewhere.


- Misha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56687/#review166298
-----------------------------------------------------------


On Feb. 14, 2017, 11:03 p.m., Misha Dmitriev wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56687/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2017, 11:03 p.m.)
> 
> 
> Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/HIVE-15882
>     
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See the description of the problem in 
> https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this 
> review removes most of the overhead due to duplicate strings.
> 
> Also, where maps in several places are created from other maps, use the 
> original map's size for the new map. This is to avoid the situation when a 
> map with default capacity (typically 16) is created to hold just 2-3 entries, 
> and the rest of the internal 16-entry array is wasted.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> e81cbce3e333d44a4088c10491f399e92a505293 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java 
> 08420664d59f28f75872c25c9f8ee42577b23451 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
> e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 
> 51530ac16c92cc75d501bfcb573557754ba0c964 
>   ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java 
> 55b3b551a1dac92583b6e03b10beb8172ca93d45 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
> 82dc89803be9cf9e0018720eeceb90ff450bfdc8 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
> c0edde9e92314d86482b5c46178987e79fae57fe 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
> c6ae6f290857cfd10f1023058ede99bf4a10f057 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 24d16812515bdfa90b4be7a295c0388fcdfe95ef 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
>  ede4fcbe342052ad86dadebcc49da2c0f515ea98 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java
>  0882ae2c6205b1636cbc92e76ef66bb70faadc76 
>   
> ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 
> 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 
> d4bdd96eaf8d179bed43b8a8c3be0d338940154a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java 
> b7a7e4b7a5f8941b080c7805d224d3885885f444 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 
> 73981e826870139a42ad881103fdb0a2ef8433a2 
> 
> Diff: https://reviews.apache.org/r/56687/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Misha Dmitriev
> 
>

Reply via email to