On Mon, 22 May 2023 14:23:15 GMT, Andrew Haley <[email protected]> wrote:
> This provides a solid speedup of about 3-4x over the Java implementation.
>
> I have a vectorized version of this which uses a bunch of tricks to speed it
> up, but it's complex and can still be improved. We're getting close to ramp
> down, so I'm submitting this simple intrinsic so that we can get it reviewed
> in time.
>
> Benchmarks:
>
>
> ThunderX (2, I think):
>
> Benchmark (dataSize) (provider) Mode Cnt
> Score Error Units
> Poly1305DigestBench.updateBytes 64 thrpt 3
> 14078352.014 ± 4201407.966 ops/s
> Poly1305DigestBench.updateBytes 256 thrpt 3
> 5154958.794 ± 1717146.980 ops/s
> Poly1305DigestBench.updateBytes 1024 thrpt 3
> 1416563.273 ± 1311809.454 ops/s
> Poly1305DigestBench.updateBytes 16384 thrpt 3
> 94059.570 ± 2913.021 ops/s
> Poly1305DigestBench.updateBytes 1048576 thrpt 3
> 1441.024 ± 164.443 ops/s
>
> Benchmark (dataSize) (provider) Mode Cnt
> Score Error Units
> Poly1305DigestBench.updateBytes 64 thrpt 3
> 4516486.795 ± 419624.224 ops/s
> Poly1305DigestBench.updateBytes 256 thrpt 3
> 1228542.774 ± 202815.694 ops/s
> Poly1305DigestBench.updateBytes 1024 thrpt 3
> 316051.912 ± 23066.449 ops/s
> Poly1305DigestBench.updateBytes 16384 thrpt 3
> 20649.561 ± 1094.687 ops/s
> Poly1305DigestBench.updateBytes 1048576 thrpt 3
> 310.564 ± 31.053 ops/s
>
> Apple M1:
>
> Benchmark (dataSize) (provider) Mode Cnt
> Score Error Units
> Poly1305DigestBench.updateBytes 64 thrpt 3
> 33551968.946 ± 849843.905 ops/s
> Poly1305DigestBench.updateBytes 256 thrpt 3
> 9911637.214 ± 63417.224 ops/s
> Poly1305DigestBench.updateBytes 1024 thrpt 3
> 2604370.740 ± 29208.265 ops/s
> Poly1305DigestBench.updateBytes 16384 thrpt 3
> 165183.633 ± 1975.998 ops/s
> Poly1305DigestBench.updateBytes 1048576 thrpt 3
> 2587.132 ± 40.240 ops/s
>
> Benchmark (dataSize) (provider) Mode Cnt
> Score Error Units
> Poly1305DigestBench.updateBytes 64 thrpt 3
> 12373649.589 ± 184757.721 ops/s
> Poly1305DigestBench.updateBytes 256 th...
src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 573:
> 571: }
> 572:
> 573: if (FLAG_IS_DEFAULT(UsePoly1305Intrinsics)) {
Incorrect indention: extra space.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14085#discussion_r1201408065