On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski <d...@openjdk.org> wrote:
>> This fix recovers XDH performance but removes some of the P256 gains >> (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to >> return partially reduced result (e.g. this avoids extra reductions when >> mult() result is fed into addition). This is the behaviour before the >> Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) >> (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 >> XDH thrpt 3 8435.277 ± 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) >> (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 >> XDH thrpt 3 8309.028 ± 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) >> (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 >> XDH thrpt 3 8491.268 ± 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 >> thrpt 3 6398.727 ± 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 >> thrpt 3 6129.739 ± 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 >> thrpt 3 1889.928 ± 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt 3 1866.339 ± 42.438 ops/s >> Benchmark (algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1350.745 ± 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1349.393 ± 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > comment from Sandhya There are examples in C2 how to check method's class holder (intrinsic's predicate) before executing intrinsic code. See, for example, code for `_counterMode_AESCrypt` in `library_call.cpp`. I am not sure is this what you are asking for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174736168