[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678735#comment-16678735
 ] 

ASF GitHub Bot commented on AVRO-2247:
--------------------------------------

unchuckable commented on issue #354: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/354#issuecomment-436765147
 
 
   Rebased as requested, and added small change to Perf.java to use 
`GenericData.get().createDatumReader( schema )` instead of `new 
GenericDatumReader( schema )`.
   Also, using `WeakIdentityHashMap` instead of `WeakHashMap` for schema 
lookups for additional speedup.
   
   As noted, am curious for any feedback and willing to work on implementation 
and style details. Just need to know if this is something worth pursuing.
   
   With current changes, I get the following Perf.java comparison:
   
   test name | time (fast read disabled) | time (fast read enabled)
   ----|-----|----
   FooBarSpecificRecordTestRead | 5534 ms |   3115 ms
   GenericRead | 4711 ms |3422 ms
   GenericStringsRead | 4902 ms |   3695 ms
   GenericNested_Read | 7190 ms |  4961 ms
   GenericNestedFake_Read | 2581 ms |   2461 ms
   GenericWithDefault_Read | 8400 ms |  3746 ms
   GenericWithOutOfOrder_Read | 4627 ms |   3549 ms
   GenericWithPromotion_Read | 4991 ms |   3673 ms
   GenericOneTimeDecoderUse_Read | 4618 ms |   3496 ms
   GenericOneTimeReaderUse_Read | 7035 ms |   4693 ms
   GenericOneTimeUse_Read  | 6965 ms |   4721 ms
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Improve Java reading performance with a new reader
> --------------------------------------------------
>
>                 Key: AVRO-2247
>                 URL: https://issues.apache.org/jira/browse/AVRO-2247
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Martin Jubelgas
>            Priority: Major
>             Fix For: 1.9.0
>
>         Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to