On Mon, 22 May 2023 14:23:15 GMT, Andrew Haley <a...@openjdk.org> wrote:
> This provides a solid speedup of about 3-4x over the Java implementation. > > I have a vectorized version of this which uses a bunch of tricks to speed it > up, but it's complex and can still be improved. We're getting close to ramp > down, so I'm submitting this simple intrinsic so that we can get it reviewed > in time. > > Benchmarks: > > > ThunderX (2, I think): > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 14078352.014 ± 4201407.966 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 > 5154958.794 ± 1717146.980 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 > 1416563.273 ± 1311809.454 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 > 94059.570 ± 2913.021 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 > 1441.024 ± 164.443 ops/s > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 4516486.795 ± 419624.224 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 > 1228542.774 ± 202815.694 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 > 316051.912 ± 23066.449 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 > 20649.561 ± 1094.687 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 > 310.564 ± 31.053 ops/s > > Apple M1: > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 33551968.946 ± 849843.905 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 > 9911637.214 ± 63417.224 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 > 2604370.740 ± 29208.265 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 > 165183.633 ± 1975.998 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 > 2587.132 ± 40.240 ops/s > > Benchmark (dataSize) (provider) Mode Cnt > Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 > 12373649.589 ± 184757.721 ops/s > Poly1305DigestBench.updateBytes 256 th... src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 573: > 571: } > 572: > 573: if (FLAG_IS_DEFAULT(UsePoly1305Intrinsics)) { Incorrect indention: extra space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14085#discussion_r1201408065