> On Feb. 22, 2017, 3:30 a.m., Sahil Takiar wrote: > > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java, line > > 29 > > <https://reviews.apache.org/r/56687/diff/1/?file=1633965#file1633965line29> > > > > How will this class relate to the intern utils provided in > > `HiveStringUtils`?
The string interning methods in HiveStringUtils, that use a custom Google interner for strings, are now obsolete. This code was likely created in the old days of JDK6, when the standard String.intern() didn't scale well. But since JDK7, several improevements were made to String.intern(), and now it works very well. So there is no need for this custom code anymore. Now, the code from my new class StringInternUtils code be moved to HiveStringUtils - it's not against logic. The main reason I don't want to do that is that HiveStringUtils.java is already huge. My new interning methods need some static fields and other intitialization. It's easy enough to understand and maintain when this stuff is all close together in its own class. But not so good if it gets spread across several places in HiveStringUtils.java. > On Feb. 22, 2017, 3:30 a.m., Sahil Takiar wrote: > > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java, line > > 87 > > <https://reviews.apache.org/r/56687/diff/1/?file=1633965#file1633965line87> > > > > Add check to see if path is null Done. > On Feb. 22, 2017, 3:30 a.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java, > > line 394 > > <https://reviews.apache.org/r/56687/diff/1/?file=1633975#file1633975line394> > > > > Did the changes to the skew join come up when running the `count()` > > queries? Or did you notice that this code could benefit from interning and > > decide to update it. Most likely the former. There is so much code in Hive that it would be difficult to just browse through it and find all pieces that would potentially benefit from interning. So I think I've noticed that this source of Path objects is likely responsible for duplicate strings somewhere. - Misha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56687/#review166298 ----------------------------------------------------------- On Feb. 14, 2017, 11:03 p.m., Misha Dmitriev wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56687/ > ----------------------------------------------------------- > > (Updated Feb. 14, 2017, 11:03 p.m.) > > > Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena. > > > Bugs: https://issues.apache.org/jira/browse/HIVE-15882 > > https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882 > > > Repository: hive-git > > > Description > ------- > > See the description of the problem in > https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this > review removes most of the overhead due to duplicate strings. > > Also, where maps in several places are created from other maps, use the > original map's size for the new map. This is to avoid the situation when a > map with default capacity (typically 16) is created to hold just 2-3 entries, > and the rest of the internal 16-entry array is wasted. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > e81cbce3e333d44a4088c10491f399e92a505293 > ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java > 08420664d59f28f75872c25c9f8ee42577b23451 > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java > e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 > ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java > 51530ac16c92cc75d501bfcb573557754ba0c964 > ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java > 55b3b551a1dac92583b6e03b10beb8172ca93d45 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java > 82dc89803be9cf9e0018720eeceb90ff450bfdc8 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java > c0edde9e92314d86482b5c46178987e79fae57fe > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java > c6ae6f290857cfd10f1023058ede99bf4a10f057 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java > 24d16812515bdfa90b4be7a295c0388fcdfe95ef > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java > ede4fcbe342052ad86dadebcc49da2c0f515ea98 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java > 0882ae2c6205b1636cbc92e76ef66bb70faadc76 > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java > 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f > ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java > d4bdd96eaf8d179bed43b8a8c3be0d338940154a > ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java > b7a7e4b7a5f8941b080c7805d224d3885885f444 > ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java > 73981e826870139a42ad881103fdb0a2ef8433a2 > > Diff: https://reviews.apache.org/r/56687/diff/ > > > Testing > ------- > > > Thanks, > > Misha Dmitriev > >