Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

Mohammad Islam Fri, 30 Aug 2013 12:19:32 -0700


> On Aug. 26, 2013, 5:35 a.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java,
> >  line 529
> > <https://reviews.apache.org/r/12480/diff/3/?file=338097#file338097line529>
> >
> >     Weird spacing... 2x below as well.


Done


> On Aug. 26, 2013, 5:35 a.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java, line 49
> > <https://reviews.apache.org/r/12480/diff/3/?file=338099#file338099line49>
> >
> >     And this would indicate a bug.

Done.


> On Aug. 26, 2013, 5:35 a.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java, line 38
> > <https://reviews.apache.org/r/12480/diff/3/?file=338099#file338099line38>
> >
> >     These should never be null, not even in testing.  It's better to change 
> > the tests to correctly populate the data structure.

AvroGenericRecordWriteable needs the record reader ID. Since we are 
instantiating one here. We will need to set it w/o any checking.
Remove unnecessary null checkings.


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/#review25537
-----------------------------------------------------------


On Aug. 30, 2013, 6:49 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12480/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 6:49 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-4732
>     https://issues.apache.org/jira/browse/HIVE-4732
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> From our performance analysis, we found AvroSerde's schema.equals() call 
> consumed a substantial amount ( nearly 40%) of time. This patch intends to 
> minimize the number schema.equals() calls by pushing the check as late/fewer 
> as possible.
> 
> At first, we added a unique id for each record reader which is then included 
> in every AvroGenericRecordWritable. Then, we introduce two new data 
> structures (one hashset and one hashmap) to store intermediate data to avoid 
> duplicates checkings. Hashset contains all the record readers' IDs that don't 
> need any re-encoding. On the other hand, HashMap contains the already used 
> re-encoders. It works as cache and allows re-encoders reuse. With this 
> change, our test shows nearly 40% reduction in Avro record reading time.
>  
>    
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
> ed2a9af 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> e994411 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
>  66f0348 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 
> 3828940 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
> 9af751b 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 
> 
> Diff: https://reviews.apache.org/r/12480/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>

Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

Reply via email to