----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12480/ -----------------------------------------------------------
(Updated Aug. 30, 2013, 6:49 p.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Changes ------- Updated with Jakob's comments Bugs: HIVE-4732 https://issues.apache.org/jira/browse/HIVE-4732 Repository: hive-git Description ------- >From our performance analysis, we found AvroSerde's schema.equals() call >consumed a substantial amount ( nearly 40%) of time. This patch intends to >minimize the number schema.equals() calls by pushing the check as late/fewer >as possible. At first, we added a unique id for each record reader which is then included in every AvroGenericRecordWritable. Then, we introduce two new data structures (one hashset and one hashmap) to store intermediate data to avoid duplicates checkings. Hashset contains all the record readers' IDs that don't need any re-encoding. On the other hand, HashMap contains the already used re-encoders. It works as cache and allows re-encoders reuse. With this change, our test shows nearly 40% reduction in Avro record reading time. Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java ed2a9af serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java e994411 serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java 66f0348 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 3828940 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 9af751b serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb Diff: https://reviews.apache.org/r/12480/diff/ Testing ------- Thanks, Mohammad Islam