On Mon, 22 May 2023 14:23:15 GMT, Andrew Haley <a...@openjdk.org> wrote:
> This provides a solid speedup of about 3-4x over the Java implementation. > > I have a vectorized version of this which uses a bunch of tricks to speed it > up, but it's complex and can still be improved. We're getting close to ramp > down, so I'm submitting this simple intrinsic so that we can get it reviewed > in time. > > Benchmarks: > > > ThunderX (2, I think): > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 14078352.014 ± 4201407.966 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 > 5154958.794 ± 1717146.980 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 > 1416563.273 ± 1311809.454 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 > 94059.570 ± 2913.021 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 > 1441.024 ± 164.443 ops/s > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 4516486.795 ± 419624.224 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 > 1228542.774 ± 202815.694 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 > 316051.912 ± 23066.449 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 > 20649.561 ± 1094.687 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 > 310.564 ± 31.053 ops/s > > Apple M1: > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 33551968.946 ± 849843.905 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 > 9911637.214 ± 63417.224 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 > 2604370.740 ± 29208.265 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 > 165183.633 ± 1975.998 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 > 2587.132 ± 40.240 ops/s > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 12373649.589 ± 184757.721 ops/s > Poly1305DigestBench.updateBytes 256 th... This pull request has now been integrated. Changeset: dc21e8aa Author: Andrew Haley <a...@openjdk.org> URL: https://git.openjdk.org/jdk/commit/dc21e8aa8321abb161bbbc02ca379eda27a4984c Stats: 195 lines in 4 files changed: 194 ins; 0 del; 1 mod 8296411: AArch64: Accelerated Poly1305 intrinsics Reviewed-by: redestad, adinn ------------- PR: https://git.openjdk.org/jdk/pull/14085