> This provides a solid speedup of about 3-4x over the Java implementation.
> 
> I have a vectorized version of this which uses a bunch of tricks to speed it 
> up, but it's complex and can still be improved. We're getting close to ramp 
> down, so I'm submitting this simple intrinsic so that we can get it reviewed 
> in time.
> 
> Benchmarks:
> 
> 
> ThunderX (2, I think):
> 
> Benchmark                        (dataSize)  (provider)   Mode  Cnt         
> Score         Error  Units
> Poly1305DigestBench.updateBytes          64              thrpt    3  
> 14078352.014 ± 4201407.966  ops/s
> Poly1305DigestBench.updateBytes         256              thrpt    3   
> 5154958.794 ± 1717146.980  ops/s
> Poly1305DigestBench.updateBytes        1024              thrpt    3   
> 1416563.273 ± 1311809.454  ops/s
> Poly1305DigestBench.updateBytes       16384              thrpt    3     
> 94059.570 ±    2913.021  ops/s
> Poly1305DigestBench.updateBytes     1048576              thrpt    3      
> 1441.024 ±     164.443  ops/s
> 
> Benchmark                        (dataSize)  (provider)   Mode  Cnt        
> Score        Error  Units
> Poly1305DigestBench.updateBytes          64              thrpt    3  
> 4516486.795 ± 419624.224  ops/s
> Poly1305DigestBench.updateBytes         256              thrpt    3  
> 1228542.774 ± 202815.694  ops/s
> Poly1305DigestBench.updateBytes        1024              thrpt    3   
> 316051.912 ±  23066.449  ops/s
> Poly1305DigestBench.updateBytes       16384              thrpt    3    
> 20649.561 ±   1094.687  ops/s
> Poly1305DigestBench.updateBytes     1048576              thrpt    3      
> 310.564 ±     31.053  ops/s
> 
> Apple M1:
> 
> Benchmark                        (dataSize)  (provider)   Mode  Cnt         
> Score        Error  Units
> Poly1305DigestBench.updateBytes          64              thrpt    3  
> 33551968.946 ± 849843.905  ops/s
> Poly1305DigestBench.updateBytes         256              thrpt    3   
> 9911637.214 ±  63417.224  ops/s
> Poly1305DigestBench.updateBytes        1024              thrpt    3   
> 2604370.740 ±  29208.265  ops/s
> Poly1305DigestBench.updateBytes       16384              thrpt    3    
> 165183.633 ±   1975.998  ops/s
> Poly1305DigestBench.updateBytes     1048576              thrpt    3      
> 2587.132 ±     40.240  ops/s
> 
> Benchmark                        (dataSize)  (provider)   Mode  Cnt         
> Score        Error  Units
> Poly1305DigestBench.updateBytes          64              thrpt    3  
> 12373649.589 ± 184757.721  ops/s
> Poly1305DigestBench.updateBytes         256              th...

Andrew Haley has updated the pull request incrementally with one additional 
commit since the last revision:

  Comment change only

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/14085/files
  - new: https://git.openjdk.org/jdk/pull/14085/files/c74ed2c6..9cc899b9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14085&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14085&range=01-02

  Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/14085.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14085/head:pull/14085

PR: https://git.openjdk.org/jdk/pull/14085

Reply via email to