On Tue, 2 Apr 2024 19:19:59 GMT, Volodymyr Paprotski <d...@openjdk.org> wrote:
>> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 >> thrpt 3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 >> thrpt 3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 >> thrpt 3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt 3 1878.955 ± 45.487 ops/s >> Benchmark (algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 >> thrpt 3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 >> thrpt 3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 >> thrpt 3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt 3 1932.127 ± 35.920 ops/s >> Benchmark (algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > remove use of jdk.crypto.ec Few early comments. Please update the copyright year of all the modified files. You can even consider splitting this into two patches, Java side changes in one and x86 optimized intrinsic in next one. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 39: > 37: }; > 38: static address modulus_p256() { > 39: return (address)MODULUS_P256; Long constants should have UL suffix. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 386: > 384: __ jcc(Assembler::equal, L_Length19); > 385: > 386: // Default copy loop Please add appropriate loop entry alignment. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 394: > 392: __ lea(aLimbs, Address(aLimbs,8)); > 393: __ lea(bLimbs, Address(bLimbs,8)); > 394: __ jmp(L_DefaultLoop); Both sub and cmp are flag affecting instructions and are macro-fusible. By doing a loop rotation i.e. moving the length <= 0 check outside the loop and pushing the loop exit check at bottom you can save additional compare checks. ------------- PR Review: https://git.openjdk.org/jdk/pull/18583#pullrequestreview-1981555803 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1553056633 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1552710600 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1553110376