> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > Do you have after-optimization performance numbers?  Can you add a test to 
> > verify that the reencoder cache is working correctly?  Feed in a record 
> > with one uuid, then another with a different and verify that the cache has 
> > two elements.  Adding a third record with the original UUID shouldn't 
> > increase the size of the cache.  Also, that adding n records all with the 
> > same schema creates only one reencoder...

Yes we have the number after optimization. For example, each record used to 
take nearly 50 micro-second. After this patch, it becomes nearly 31 
micro-seconds.
Added the test case as proposed. 


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java, 
> > line 66
> > <https://reviews.apache.org/r/12480/diff/1/?file=320688#file320688line66>
> >
> >     verifiedRecordReaders -> noReencodingNeeded ?

Done


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java, 
> > line 155
> > <https://reviews.apache.org/r/12480/diff/1/?file=320688#file320688line155>
> >
> >     readability: pull out getRecordReaderID into its own var

Done


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java,
> >  line 78
> > <https://reviews.apache.org/r/12480/diff/1/?file=320689#file320689line78>
> >
> >     Need to write out the uuid too

Done


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java,
> >  line 92
> > <https://reviews.apache.org/r/12480/diff/1/?file=320689#file320689line92>
> >
> >     Need to read in the uuid too

Done


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/#review23113
-----------------------------------------------------------


On July 11, 2013, 10:31 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12480/
> -----------------------------------------------------------
> 
> (Updated July 11, 2013, 10:31 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-4732
>     https://issues.apache.org/jira/browse/HIVE-4732
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> From our performance analysis, we found AvroSerde's schema.equals() call 
> consumed a substantial amount ( nearly 40%) of time. This patch intends to 
> minimize the number schema.equals() calls by pushing the check as late/fewer 
> as possible.
> 
> At first, we added a unique id for each record reader which is then included 
> in every AvroGenericRecordWritable. Then, we introduce two new data 
> structures (one hashset and one hashmap) to store intermediate data to avoid 
> duplicates checkings. Hashset contains all the record readers' IDs that don't 
> need any re-encoding. On the other hand, HashMap contains the already used 
> re-encoders. It works as cache and allows re-encoders reuse. With this 
> change, our test shows nearly 40% reduction in Avro record reading time.
>  
>    
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
> dbc999f 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> c85ef15 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
>  66f0348 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
> 9af751b 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 
> 
> Diff: https://reviews.apache.org/r/12480/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>

Reply via email to