On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski <d...@openjdk.org> wrote:
>> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 >> thrpt 3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 >> thrpt 3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 >> thrpt 3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt 3 1878.955 ± 45.487 ops/s >> Benchmark (algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 >> thrpt 3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 >> thrpt 3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 >> thrpt 3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt 3 1932.127 ± 35.920 ops/s >> Benchmark (algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > whitespace src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 168: > 166: XMMRegister broadcast5 = xmm24; > 167: KRegister limb0 = k1; > 168: KRegister limb5 = k2; limb5 and select are not being used anymore. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 185: > 183: __ evmovdquq(modulus, allLimbs, ExternalAddress(modulus_p256()), > false, Assembler::AVX_512bit, rscratch); > 184: > 185: // A = load(*aLimbs) A little bit more description in comments on what the load step involves would be helpful. e.g. Load upper 4 limbs, shift left by 1 limb using perm, or in the lowest limb. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 270: > 268: __ push(r14); > 269: __ push(r15); > 270: No need to save/restore rbx, r12, r14, r15. Only r13 is used as temp in montgomeryMultiply(aLimbs, bLimbs, rLimbs). That too could be easily changed to r8. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 286: > 284: __ mov(aLimbs, c_rarg0); > 285: __ mov(bLimbs, c_rarg1); > 286: __ mov(rLimbs, c_rarg2); We could directly call montgomeryMultiply(c_rarg0, c_rarg1, c_rarg2) then these moves are not necessary. src/hotspot/cpu/x86/vm_version_x86.cpp line 1370: > 1368: > 1369: #ifdef _LP64 > 1370: if (supports_avx512ifma() && supports_avx512vlbw() && MaxVectorSize > >= 64) { No need to tie the intrinsic to MaxVectorSize setting. src/hotspot/share/opto/library_call.cpp line 7564: > 7562: > 7563: if (!stubAddr) return false; > 7564: if (stopped()) return true; Line 7564 seems redundant here as there is no range check or anything like that before this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604169603 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604141586 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604174141 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604175443 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1603792252 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1603865712