[
https://issues.apache.org/jira/browse/AVRO-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190896#comment-15190896
]
Konstantin Usachev commented on AVRO-1809:
------------------------------------------
Thank you for your answer. I've benchmarked it, and, surprisingly for me, this
optimization has quite measurable affect. I've tried to optimize it in some
other way, but, unfortunetly, failed, except some performance boost we can get,
making org.apache.avro.util.WeakIdentityHashMap#reap not synchronized, because,
actually, this map isn't thread safe, regardless this method synchronization.
So, my results are:
Without optimization:
{noformat}
test name time M entries/sec M
bytes/sec bytes/cycle
GenericRead: 3268 ms 5,099 197,903
808498
GenericWrite: 1679 ms 9,925 385,167
808498
GenericStringsRead: 4483 ms 3,717 396,236
2220873
GenericStringsWrite: 5609 ms 2,971 316,756
2220873
GenericNested_Read: 5086 ms 3,277 127,162
808498
GenericNested_Write: 2711 ms 6,146 238,515
808498
GenericNestedFake_Read: 1852 ms 8,995 349,092
808498
GenericNestedFake_Write: 970 ms 17,164 666,121
808498
GenericWithDefault_Read: 5923 ms 2,814 109,191
808498
GenericWithOutOfOrder_Read: 3297 ms 5,054 196,120
808498
GenericWithPromotion_Read: 3625 ms 4,597 178,408
808498
{noformat}
Without optimization and not synchronized method:
{noformat}
test name time M entries/sec M
bytes/sec bytes/cycle
GenericRead: 2643 ms 6,305 244,702
808498
GenericWrite: 1675 ms 9,946 385,992
808498
GenericStringsRead: 3892 ms 4,282 456,431
2220873
GenericStringsWrite: 5498 ms 3,031 323,117
2220873
GenericNested_Read: 4495 ms 3,708 143,890
808498
GenericNested_Write: 2734 ms 6,096 236,570
808498
GenericNestedFake_Read: 1847 ms 9,021 350,087
808498
GenericNestedFake_Write: 974 ms 17,111 664,057
808498
GenericWithDefault_Read: 5218 ms 3,193 123,934
808498
GenericWithOutOfOrder_Read: 2680 ms 6,217 241,265
808498
GenericWithPromotion_Read: 2931 ms 5,686 220,659
808498
{noformat}
With optimization:
{noformat}
test name time M entries/sec M
bytes/sec bytes/cycle
GenericRead: 2310 ms 7,212 279,888
808498
GenericWrite: 1551 ms 10,741 416,837
808498
GenericStringsRead: 3537 ms 4,712 502,264
2220873
GenericStringsWrite: 5595 ms 2,978 317,512
2220873
GenericNested_Read: 4453 ms 3,742 145,218
808498
GenericNested_Write: 2622 ms 6,354 246,606
808498
GenericNestedFake_Read: 1853 ms 8,992 348,948
808498
GenericNestedFake_Write: 980 ms 16,989 659,328
808498
GenericWithDefault_Read: 4571 ms 3,645 141,472
808498
GenericWithOutOfOrder_Read: 2266 ms 7,352 285,313
808498
GenericWithPromotion_Read: 2673 ms 6,233 241,911
808498
{noformat}
So, I suppose, you are not gonna accept this PR. The problem we've encountered
with Qusar integration is because it makes thread local values not thread but
fiber local, so values might migrate between threads after fibers. So because
of this optimization we might return the same resolver in different threads. At
first I've made helper to erase this field through reflection, but such
solution isn't good enough. May be you have any ideas?
> I wish to remove optimization from GenericDatumReader.getResolver
> -----------------------------------------------------------------
>
> Key: AVRO-1809
> URL: https://issues.apache.org/jira/browse/AVRO-1809
> Project: Avro
> Issue Type: Wish
> Components: java
> Reporter: Konstantin Usachev
> Priority: Minor
>
> There is an optimization at
> org.apache.avro.generic.GenericDatumReader.getResolver, when we cache creator
> thread and it's first returned value. At first, It looks redundant, because
> it saves three calls to Map.get, which is unmeasurable, especially after
> Schema's hashcode calculation optimization, made by the same author
> [~cutting], it's not obvious and adds additional complexity. Also caching of
> current thread whould be a source of bugs in case of different green threads
> libraries integration (which, actually, occurred during integration with
> Quasar).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)