Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

via GitHub Wed, 06 Dec 2023 00:14:09 -0800


vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1842384044


   > is this new approach helping cover more cases maybe?
    
   Yes, Previous patch only apply copy for level 0 's acc in 
``Lucene99SkipWriter.bufferSkip``,  Current patch apply copy for all level's 
acc in ``Lucene99SkipWriter.writeSkipData`` by adding a flag to indicates 
wether this is an empty acc.
   
   > when knowing when the accumulator is empty is the responsibility of the 
caller rather than CompetitiveFreqNormAccumulator.
   
   I will move the copy/addAll selection to caller.
   
   Before that, there are some performance data I want to share:
   
   First of all, I measured method execution time(without assertion):
   Method | ns | diff
   -- | -- | --
   addAll | 48291 | -
   copy | 17000 |  +64.7%
   
   And add the implementation to JMH to enable JIT's optimization(auto-unroll, 
auto-vectorization):
   ````
   Benchmark                            (size)   Mode  Cnt   Score   Error   
Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  17.311 ± 4.468  
ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  21.417 ± 7.009  
ops/us
   
   Benchmark                            (size)   Mode  Cnt   Score   Error   
Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  19.943 ± 4.831  
ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  16.516 ± 1.904  
ops/us
   ````
   
   The speedup is not stable, so I split the implementation into array part and 
treeSet part.
   1. Array part: 
   ````
   Benchmark                              (size)   Mode  Cnt    Score    Error  
 Units
   VectorUtilBenchmark.arrayCopy             256  thrpt   15  141.356 ±  3.640  
ops/us 
   VectorUtilBenchmark.arrayMax              256  thrpt   15   77.875 ±  1.310  
ops/us 
   
   Benchmark                              (size)   Mode  Cnt    Score    Error  
 Units
   VectorUtilBenchmark.arrayCopy             256  thrpt   15  138.484 ±  5.038  
ops/us
   VectorUtilBenchmark.arrayMax              256  thrpt   15   78.250 ±  0.992  
ops/us
   ````
   ArrayMax is the baseline that iterates the array and get the max value. 
   ArrayCopy uses ``System.arraycopy``, which get a explicit speedup.
   
   2. TreeSet part:
   ````
   Benchmark                          (size)   Mode  Cnt   Score    Error   
Units
   VectorUtilBenchmark.treeAddAll        256  thrpt   15  27.673 ± 19.096  
ops/us
   VectorUtilBenchmark.treeScalarAdd     256  thrpt   15  34.888 ± 14.321  
ops/us
   
   Benchmark                          (size)   Mode  Cnt   Score    Error   
Units
   VectorUtilBenchmark.treeAddAll        256  thrpt   15  25.342 ±  4.151  
ops/us
   VectorUtilBenchmark.treeScalarAdd     256  thrpt   15  42.673 ± 19.396  
ops/us
   ````
   I also measured the method performance without JMH:
   Method | ns | diff
   -- | -- | --
   treeScalarAdd | 36000 | -
   treeAddAll | 26875 |  +25.3%
   
   The method performance indicates that `TreeSet.addAll` get a better 
performance, but JMH benchmark result indicates that scalarAdd(baseline) get a 
better performance than `TreeSet.addAll`, which is unexpected for me. I am 
still working on 
   finding out the reason. 
   
   Finally, I modified `acc.copy` to just copy the array but use the original 
iterate add for treeSet, and measured the performance: 
   ````
   Benchmark                            (size)   Mode  Cnt   Score    Error   
Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  16.180 ±  3.014  
ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  31.807 ± 12.811  
ops/us
   
   Benchmark                            (size)   Mode  Cnt   Score   Error   
Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  20.414 ± 3.693  
ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  24.432 ± 6.862  
ops/us
   ````
   Now `acc.copy`  get a speedup.
   Any idea about that? @jpountz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Reply via email to