> On Aug. 26, 2013, 5:35 a.m., Jakob Homan wrote: > > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java, > > line 529 > > <https://reviews.apache.org/r/12480/diff/3/?file=338097#file338097line529> > > > > Weird spacing... 2x below as well.
Done > On Aug. 26, 2013, 5:35 a.m., Jakob Homan wrote: > > serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java, line 49 > > <https://reviews.apache.org/r/12480/diff/3/?file=338099#file338099line49> > > > > And this would indicate a bug. Done. > On Aug. 26, 2013, 5:35 a.m., Jakob Homan wrote: > > serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java, line 38 > > <https://reviews.apache.org/r/12480/diff/3/?file=338099#file338099line38> > > > > These should never be null, not even in testing. It's better to change > > the tests to correctly populate the data structure. AvroGenericRecordWriteable needs the record reader ID. Since we are instantiating one here. We will need to set it w/o any checking. Remove unnecessary null checkings. - Mohammad ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12480/#review25537 ----------------------------------------------------------- On Aug. 30, 2013, 6:49 p.m., Mohammad Islam wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12480/ > ----------------------------------------------------------- > > (Updated Aug. 30, 2013, 6:49 p.m.) > > > Review request for hive, Ashutosh Chauhan and Jakob Homan. > > > Bugs: HIVE-4732 > https://issues.apache.org/jira/browse/HIVE-4732 > > > Repository: hive-git > > > Description > ------- > > From our performance analysis, we found AvroSerde's schema.equals() call > consumed a substantial amount ( nearly 40%) of time. This patch intends to > minimize the number schema.equals() calls by pushing the check as late/fewer > as possible. > > At first, we added a unique id for each record reader which is then included > in every AvroGenericRecordWritable. Then, we introduce two new data > structures (one hashset and one hashmap) to store intermediate data to avoid > duplicates checkings. Hashset contains all the record readers' IDs that don't > need any re-encoding. On the other hand, HashMap contains the already used > re-encoders. It works as cache and allows re-encoders reuse. With this > change, our test shows nearly 40% reduction in Avro record reading time. > > > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java > ed2a9af > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java > e994411 > > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java > 66f0348 > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java > 3828940 > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java > 9af751b > serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb > > Diff: https://reviews.apache.org/r/12480/diff/ > > > Testing > ------- > > > Thanks, > > Mohammad Islam > >