[
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662357#comment-16662357
]
ASF GitHub Bot commented on AVRO-2247:
--------------------------------------
unchuckable opened a new pull request #354: AVRO-2247 - improved java reading
performance with new reader
URL: https://github.com/apache/avro/pull/354
This is the first implementation of a proposed new reader design as
described in AVRO-2247 that improves reading performance both for generic and
specific records. Please let me know what you think. Classes could be
consolidated into inner classes, but I did not want to spend too much aestetics
work before getting feedback on whether this feature is feasible.
Feature can be enabled per GenericData or SpecfiicData instance of by
setting system property `org.apache.avro.fastread` to `true`. Note that in
order to see effects in Perf, it would be required to replace calls to `new
GenericDatumReader( schema )` with `GenericData.get().createDatumReader( schema
)` (this change is not included yet).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Improve Java reading performance with a new reader
> --------------------------------------------------
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Martin Jubelgas
> Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects
> in Java and am suggesting a new implementation of a DatumReader that improves
> read performance for both generic and specific records by approximately 20%
> (and even more in cases of nested objects with defaults, a case I encounter a
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This
> execution plan contains all required defaulting/lookup values so they need
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData
> instance. The system default is set via the system variable
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the
> proposed one. Will open a pull request with respective code in a bit (not
> including interoperability with the optimizations of AVRO-2090 yet). Please
> let me know your opinion of whether this is worth pursuing further.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)