[
https://issues.apache.org/jira/browse/AVRO-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608343#comment-13608343
]
Alexandre Normand commented on AVRO-1268:
-----------------------------------------
I made a few incremental changes to my version and got again some performance
improvements. I'm still using a Map<Schema, Object> to keep states but after
looking at some other alternatives, it seems like a good one (since a
GenericDatumReader has the same instance of schema object (and, consequently,
field schemas), the keys being the same makes the lookups faster than if it had
to do a call to #equals()).
New numbers:
{code}
Executing tests:
[IntTest, SmallLongTest, LongTest, FloatTest, DoubleTest, BoolTest, BytesTest,
StringTest, ArrayTest, MapTest, RecordTest, ValidatingRecord, ResolvingRecord,
RecordWithDefault, RecordWithOutOfOrder, RecordWithPromotion, GenericTest,
GenericStrings, GenericNested, GenericNestedFake, GenericWithDefault,
GenericWithOutOfOrder, GenericWithPromotion, GenericOneTimeDecoderUse,
GenericOneTimeReaderUse, GenericOneTimeUse, FooBarSpecificRecordTest]
readTests:true
writeTests:true
cycles=800
test name time M entries/sec M bytes/sec
bytes/cycle
IntRead: 718 ms 278.233 700.395 629325
IntWrite: 1459 ms 137.044 344.981 629325
SmallLongRead: 778 ms 256.954 646.831 629325
SmallLongWrite: 1448 ms 138.061 347.541 629325
LongRead: 1695 ms 117.930 515.283 1092353
LongWrite: 2608 ms 76.660 334.959 1092353
FloatRead: 369 ms 541.313 2165.252 1000000
FloatWrite: 1185 ms 168.692 674.768 1000000
DoubleRead: 349 ms 572.961 4583.687 2000000
DoubleWrite: 1897 ms 105.408 843.267 2000000
BooleanRead: 254 ms 786.909 786.909 250000
BooleanWrite: 521 ms 383.473 383.473 250000
BytesRead: 1597 ms 25.032 889.589 1776937
BytesWrite: 2069 ms 19.329 686.930 1776937
StringRead: 8347 ms 4.792 170.685 1780910
StringWrite: 8405 ms 4.759 169.496 1780910
ArrayRead: 399 ms 500.144 2000.587 1000006
ArrayWrite: 1154 ms 173.186 692.747 1000006
MapRead: 1337 ms 149.536 747.683 1250004
MapWrite: 2125 ms 94.090 470.449 1250004
RecordRead: 627 ms 53.122 2061.677 1617069
RecordWrite: 1978 ms 16.846 653.812 1617069
ValidatingRecordRead: 3808 ms 8.752 339.682 1617069
ValidatingRecordWrite: 3615 ms 9.219 357.807 1617069
ResolvingRecordRead: 4189 ms 7.956 308.777 1617069
RecordWithDefaultRead: 11088 ms 3.006 116.664 1617069
RecordWithOutOfOrderRead: 3307 ms 10.077 391.092 1617069
RecordWithPromotionRead: 3575 ms 9.323 361.820 1617069
GenericRead: 4979 ms 3.347 129.888 808498
GenericWrite: 3076 ms 5.418 210.253 808498
GenericStringsRead: 6826 ms 2.441 260.269 2220873
GenericStringsWrite: 12930 ms 1.289 137.399 2220873
GenericNested_Read: 7881 ms 2.115 82.064 808498
GenericNested_Write: 4710 ms 3.538 137.296 808498
GenericNestedFake_Read: 3348 ms 4.977 193.162 808498
GenericNestedFake_Write: 1503 ms 11.085 430.191 808498
GenericWithDefault_Read: 9872 ms 1.688 65.518 808498
GenericWithOutOfOrder_Read: 4988 ms 3.341 129.658 808498
GenericWithPromotion_Read: 5220 ms 3.193 123.900 808498
GenericOneTimeDecoderUse_Read: 4979 ms 3.347 129.897 808498
GenericOneTimeReaderUse_Read: 7130 ms 2.337 90.708 808498
GenericOneTimeUse_Read: 7147 ms 2.332 90.492 808498
FooBarSpecificRecordTestRead: 37078 ms 0.449 75.113 3481319
FooBarSpecificRecordTestWrite: 29507 ms 0.565 94.384 3481319
{code}
FooBarSpecificRecordTestRead is ~5.3% slower than without the patch and
FooBarSpecificRecordTestWrite is ~2% slower.
GenericStringsRead is ~4.2% faster and GenericStringWrite is ~2.7% faster.
All tests are still passing and I'm going to make a pass at the patch to clean
up/make sure I haven't broken API compatibility but I'd like to get some
feedback on these results.
Doug, what do you think of these last numbers?
> Add java-class, java-key-class and java-element-class support for stringable
> types to SpecificData
> --------------------------------------------------------------------------------------------------
>
> Key: AVRO-1268
> URL: https://issues.apache.org/jira/browse/AVRO-1268
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.4
> Reporter: Alexandre Normand
> Assignee: Alexandre Normand
> Priority: Minor
> Fix For: 1.7.5
>
> Attachments: AVRO-1268-needs-work.patch, AVRO-1268.patch,
> AVRO-1268.patch, AVRO-1268.sh, GenericStringsPerf.patch
>
>
> Stringable types are java classes that can be serialized through strings
> (which require a single string constructor and a valid toString()
> implementation). ReflectData currently has support from stringable types but
> it would be desirable to get this feature with SpecificData.
> The work involves changes to the SpecificCompiler (depends on {{@java-class}}
> support in AVRO-1267) to generate the specific sources with the proper java
> type as well as moving the ReflectDatumReader and ReflectDatumWriter to read
> the java-class/java-key-class and java-element-class properties.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira