[
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700045#comment-16700045
]
ASF GitHub Bot commented on AVRO-2247:
--------------------------------------
unchuckable commented on issue #391: AVRO-2247 - improved java reading
performance with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-441965011
Hi, @rstata.
First of all, thanks for looking into it. It means a lot. I'm sorry about
the license files; totally forgot about them files this time 😞
I pulled your change from your repo and pushed it into mine. No clue what's
up with github and the pull request there, if anybody has a pointer on what I
would need to set in my repo, any advice is welcome.
Invoking the benchmark:
`cd lang/java/benchmark`
`mvn clean package`
`java -jar target/benchmarks.jar` (not the `benchmark-1.9.0-SNAPSHOT`)
By default, it will use 5 warmup iterations and 5 measurement iterations
with 10 seconds each, and do all of that 5 times, which totals up to almost 3
hours, but it can easily be reduced to more reasonable limits (20 minutes),
like:
`java -jar target/benchmarks.jar -wi 3 -i 3 -f 1` (3 iterations for warmup
and measurement and only 1 repetition)
Adding `-e Building` will exclude the buiding of the DatumReaders from the
benchmark, and reduce the total time of evaluation by half currently.
The current benchmark classes are only a small excerpt of cases of Perf.java
(but trying to replicate them as good as possible). I can gladly add more if it
helps the project; it might make sense to move that to a different ticket
though, I guess.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Improve Java reading performance with a new reader
> --------------------------------------------------
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Reporter: Martin Jubelgas
> Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects
> in Java and am suggesting a new implementation of a DatumReader that improves
> read performance for both generic and specific records by approximately 20%
> (and even more in cases of nested objects with defaults, a case I encounter a
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This
> execution plan contains all required defaulting/lookup values so they need
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData
> instance. The system default is set via the system variable
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the
> proposed one. Will open a pull request with respective code in a bit (not
> including interoperability with the optimizations of AVRO-2090 yet). Please
> let me know your opinion of whether this is worth pursuing further.
> Â
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)