On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski <d...@openjdk.org> wrote:
>> This fix recovers XDH performance but removes some of the P256 gains >> (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to >> return partially reduced result (e.g. this avoids extra reductions when >> mult() result is fed into addition). This is the behaviour before the >> Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) >> (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 >> XDH thrpt 3 8435.277 ± 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) >> (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 >> XDH thrpt 3 8309.028 ± 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) >> (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 >> XDH thrpt 3 8491.268 ± 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 >> thrpt 3 6398.727 ± 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 >> thrpt 3 6129.739 ± 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 >> thrpt 3 1889.928 ± 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt 3 1866.339 ± 42.438 ops/s >> Benchmark (algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1350.745 ± 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt 3 1349.393 ± 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > comment from Sandhya Looks good to me. It would be good, though, to figure out what else could be done to regain the P256 performance with keeping the speed of this code path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2189545307