[jira] [Commented] (AVRO-1809) I wish to remove optimization from GenericDatumReader.getResolver

Konstantin Usachev (JIRA) Fri, 11 Mar 2016 05:10:11 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190896#comment-15190896
 ]


Konstantin Usachev commented on AVRO-1809:
------------------------------------------

Thank you for your answer. I've benchmarked it, and, surprisingly for me, this 
optimization has quite measurable affect. I've tried to optimize it in some 
other way, but, unfortunetly, failed, except some performance boost we can get, 
making org.apache.avro.util.WeakIdentityHashMap#reap not synchronized, because, 
actually, this map isn't thread safe, regardless this method synchronization. 
So, my results are:

Without optimization:
{noformat}

                                  test name     time    M entries/sec   M 
bytes/sec  bytes/cycle
                               GenericRead:   3268 ms       5,099       197,903 
       808498
                              GenericWrite:   1679 ms       9,925       385,167 
       808498
                        GenericStringsRead:   4483 ms       3,717       396,236 
      2220873
                       GenericStringsWrite:   5609 ms       2,971       316,756 
      2220873
                        GenericNested_Read:   5086 ms       3,277       127,162 
       808498
                       GenericNested_Write:   2711 ms       6,146       238,515 
       808498
                    GenericNestedFake_Read:   1852 ms       8,995       349,092 
       808498
                   GenericNestedFake_Write:    970 ms      17,164       666,121 
       808498
                   GenericWithDefault_Read:   5923 ms       2,814       109,191 
       808498
                GenericWithOutOfOrder_Read:   3297 ms       5,054       196,120 
       808498
                 GenericWithPromotion_Read:   3625 ms       4,597       178,408 
       808498
{noformat}

Without optimization and not synchronized method:
{noformat}
                                  test name     time    M entries/sec   M 
bytes/sec  bytes/cycle
                               GenericRead:   2643 ms       6,305       244,702 
       808498
                              GenericWrite:   1675 ms       9,946       385,992 
       808498
                        GenericStringsRead:   3892 ms       4,282       456,431 
      2220873
                       GenericStringsWrite:   5498 ms       3,031       323,117 
      2220873
                        GenericNested_Read:   4495 ms       3,708       143,890 
       808498
                       GenericNested_Write:   2734 ms       6,096       236,570 
       808498
                    GenericNestedFake_Read:   1847 ms       9,021       350,087 
       808498
                   GenericNestedFake_Write:    974 ms      17,111       664,057 
       808498
                   GenericWithDefault_Read:   5218 ms       3,193       123,934 
       808498
                GenericWithOutOfOrder_Read:   2680 ms       6,217       241,265 
       808498
                 GenericWithPromotion_Read:   2931 ms       5,686       220,659 
       808498
{noformat}

With optimization:
{noformat}
                                  test name     time    M entries/sec   M 
bytes/sec  bytes/cycle
                               GenericRead:   2310 ms       7,212       279,888 
       808498
                              GenericWrite:   1551 ms      10,741       416,837 
       808498
                        GenericStringsRead:   3537 ms       4,712       502,264 
      2220873
                       GenericStringsWrite:   5595 ms       2,978       317,512 
      2220873
                        GenericNested_Read:   4453 ms       3,742       145,218 
       808498
                       GenericNested_Write:   2622 ms       6,354       246,606 
       808498
                    GenericNestedFake_Read:   1853 ms       8,992       348,948 
       808498
                   GenericNestedFake_Write:    980 ms      16,989       659,328 
       808498
                   GenericWithDefault_Read:   4571 ms       3,645       141,472 
       808498
                GenericWithOutOfOrder_Read:   2266 ms       7,352       285,313 
       808498
                 GenericWithPromotion_Read:   2673 ms       6,233       241,911 
       808498
{noformat}

So, I suppose, you are not gonna accept this PR. The problem we've encountered 
with Qusar integration is because it makes thread local values not thread but 
fiber local, so values might migrate between threads after fibers. So because 
of this optimization we might return the same resolver in different threads. At 
first I've made helper to erase this field through reflection, but such 
solution isn't good enough. May be you have any ideas? 

> I wish to remove optimization from GenericDatumReader.getResolver
> -----------------------------------------------------------------
>
>                 Key: AVRO-1809
>                 URL: https://issues.apache.org/jira/browse/AVRO-1809
>             Project: Avro
>          Issue Type: Wish
>          Components: java
>            Reporter: Konstantin Usachev
>            Priority: Minor
>
> There is an optimization at 
> org.apache.avro.generic.GenericDatumReader.getResolver, when we cache creator 
> thread and it's first returned value. At first, It looks redundant, because 
> it saves three calls to Map.get, which is unmeasurable, especially after 
> Schema's hashcode calculation optimization, made by the same author 
> [~cutting], it's not obvious and adds additional complexity. Also caching of 
> current thread whould be a source of bugs in case of different green threads 
> libraries integration (which, actually, occurred during integration with 
> Quasar).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AVRO-1809) I wish to remove optimization from GenericDatumReader.getResolver

Reply via email to