On Sun, 9 Oct 2022 06:53:19 GMT, Xue-Lei Andrew Fan <xue...@openjdk.org> wrote:
> Hi, > > May I have this update reviewed? With this update, the result will be > reduced if required in EC limbs operations in the JDK implementation. > > In the current implementation, the EC limbs addition and subtraction result > is not reduced before the value is returned. This behavior is different from > multiplication and square. To get it right, a maximum addition limit is > introduced for EC limbs addition and subtraction. If the limit is reached, > an exception will be thrown so as to indicate an EC implementation problem. > The exception is not recoverable from the viewpoint of an application. > > The design is error prone as the EC implementation code must be carefully > checked so that the limit cannot reach. If a reach is noticed, a reduce > operation would have to be hard-coded additionally. In the FieldGen.java > implementation, the limit can only be 1 or 2 as the implementation code is > only able to handle 2. But the FieldGen.java does not really check if 1 or 2 > really works for the specific filed generation parameters. > > The design impact the performance as well. Because of this limit, the maximum > limb size is 28 bits for 2 max adding limit. Otherwise there are integer > (long) overflow issues. For example for 256 bits curves, 10 limbs are > required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing 1 > limbs from 10 to 9, the Signature performance could get improved by 20%. > > In the IntegerPolynomial class description, it is said "All IntegerPolynomial > implementations allow at most one addition before multiplication. Additions > after that will result in an ArithmeticException." It's too strict to follow > without exam the code very carefully. Indeed, the implementation does not > really follow the spec, and 2 addition may be allowed. > > It would be nice if there is no addition limit, and then we are free from > these issues. It is doable by reducing the limbs if required for EC limbs > operations. > > The benchmark for ECDSA Signature is checked in order to see how much the > performance could be impacted. Here are the benchmark numbers before the > patch applied: > > Benchmark (curveName) (messageLength) Mode Cnt Score Error > Units > Signatures.sign secp256r1 64 thrpt 15 1417.507 ± 5.809 > ops/s > Signatures.sign secp256r1 512 thrpt 15 1412.620 ± 7.040 > ops/s > Signatures.sign secp256r1 2048 thrpt 15 1417.364 ± 6.643 > ops/s > Signatures.sign secp256r1 16384 thrpt 15 1364.720 ± 44.354 > ops/s > Signatures.sign secp384r1 64 thrpt 15 609.924 ± 9.858 > ops/s > Signatures.sign secp384r1 512 thrpt 15 610.873 ± 5.519 > ops/s > Signatures.sign secp384r1 2048 thrpt 15 608.573 ± 13.324 > ops/s > Signatures.sign secp384r1 16384 thrpt 15 597.802 ± 7.074 > ops/s > Signatures.sign secp521r1 64 thrpt 15 301.954 ± 6.449 > ops/s > Signatures.sign secp521r1 512 thrpt 15 300.774 ± 5.849 > ops/s > Signatures.sign secp521r1 2048 thrpt 15 295.831 ± 8.423 > ops/s > Signatures.sign secp521r1 16384 thrpt 15 276.258 ± 43.146 > ops/s > Signatures.sign Ed25519 64 thrpt 15 1171.878 ± 4.187 > ops/s > Signatures.sign Ed25519 512 thrpt 15 1175.175 ± 4.611 > ops/s > Signatures.sign Ed25519 2048 thrpt 15 1168.459 ± 5.750 > ops/s > Signatures.sign Ed25519 16384 thrpt 15 1086.906 ± 6.514 > ops/s > Signatures.sign Ed448 64 thrpt 15 326.475 ± 3.274 > ops/s > Signatures.sign Ed448 512 thrpt 15 328.709 ± 3.867 > ops/s > Signatures.sign Ed448 2048 thrpt 15 315.480 ± 15.377 > ops/s > Signatures.sign Ed448 16384 thrpt 15 312.388 ± 1.067 > ops/s > > > and here are the numbers with this patch applied: > > Benchmark (curveName) (messageLength) Mode Cnt Score Error > Units > Signatures.sign secp256r1 64 thrpt 15 1377.447 ± 38.889 > ops/s > Signatures.sign secp256r1 512 thrpt 15 1403.149 ± 5.151 > ops/s > Signatures.sign secp256r1 2048 thrpt 15 1401.101 ± 6.937 > ops/s > Signatures.sign secp256r1 16384 thrpt 15 1381.855 ± 10.606 > ops/s > Signatures.sign secp384r1 64 thrpt 15 595.979 ± 1.967 > ops/s > Signatures.sign secp384r1 512 thrpt 15 592.419 ± 6.087 > ops/s > Signatures.sign secp384r1 2048 thrpt 15 581.800 ± 18.815 > ops/s > Signatures.sign secp384r1 16384 thrpt 15 583.224 ± 9.856 > ops/s > Signatures.sign secp521r1 64 thrpt 15 264.772 ± 36.883 > ops/s > Signatures.sign secp521r1 512 thrpt 15 302.084 ± 1.074 > ops/s > Signatures.sign secp521r1 2048 thrpt 15 296.619 ± 3.272 > ops/s > Signatures.sign secp521r1 16384 thrpt 15 290.112 ± 5.085 > ops/s > Signatures.sign Ed25519 64 thrpt 15 1163.293 ± 4.266 > ops/s > Signatures.sign Ed25519 512 thrpt 15 1157.093 ± 8.145 > ops/s > Signatures.sign Ed25519 2048 thrpt 15 1140.900 ± 8.660 > ops/s > Signatures.sign Ed25519 16384 thrpt 15 1053.233 ± 40.591 > ops/s > Signatures.sign Ed448 64 thrpt 15 317.509 ± 6.870 > ops/s > Signatures.sign Ed448 512 thrpt 15 320.975 ± 1.534 > ops/s > Signatures.sign Ed448 2048 thrpt 15 322.157 ± 1.914 > ops/s > Signatures.sign Ed448 16384 thrpt 15 303.532 ± 6.419 > ops/s > > > From the numbers, it looks like the performance impact is limited. Based on > this update, the performance could get rewarded by using less limbs. For > examples if using less limbs for secp256r1/secp384r1/secp521r1/curve25519, > [see PR for PR for JDK-8294248](https://github.com/openjdk/jdk/pull/10398), > and run the benchmark together with this patch, I could see the performance > improvement as follow (note: No update for Ed448 as using less limbs does not > improve the performance for Ed448): > > Benchmark (curveName) (messageLength) Mode Cnt Score Error > Units > Signatures.sign secp256r1 64 thrpt 15 1769.262 ± 4.251 > ops/s > Signatures.sign secp256r1 512 thrpt 15 1764.754 ± 12.185 > ops/s > Signatures.sign secp256r1 2048 thrpt 15 1737.499 ± 43.937 > ops/s > Signatures.sign secp256r1 16384 thrpt 15 1733.932 ± 7.938 > ops/s > Signatures.sign secp384r1 64 thrpt 15 636.614 ± 6.752 > ops/s > Signatures.sign secp384r1 512 thrpt 15 629.449 ± 8.000 > ops/s > Signatures.sign secp384r1 2048 thrpt 15 645.898 ± 5.655 > ops/s > Signatures.sign secp384r1 16384 thrpt 15 634.530 ± 11.846 > ops/s > Signatures.sign secp521r1 64 thrpt 15 334.413 ± 6.710 > ops/s > Signatures.sign secp521r1 512 thrpt 15 335.088 ± 6.630 > ops/s > Signatures.sign secp521r1 2048 thrpt 15 339.729 ± 1.450 > ops/s > Signatures.sign secp521r1 16384 thrpt 15 327.597 ± 2.341 > ops/s > Signatures.sign Ed25519 64 thrpt 15 1361.028 ± 2.338 > ops/s > Signatures.sign Ed25519 512 thrpt 15 1359.072 ± 3.748 > ops/s > Signatures.sign Ed25519 2048 thrpt 15 1347.409 ± 9.172 > ops/s > Signatures.sign Ed25519 16384 thrpt 15 1250.794 ± 11.750 > ops/s > Signatures.sign Ed448 64 thrpt 15 321.312 ± 5.826 > ops/s > Signatures.sign Ed448 512 thrpt 15 326.592 ± 4.213 > ops/s > Signatures.sign Ed448 2048 thrpt 15 322.183 ± 7.005 > ops/s > Signatures.sign Ed448 16384 thrpt 15 299.869 ± 16.594 > ops/s > ``` > > If considering this PR together with [PR for > JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance > improvement could be expected for EC crypto primitives. > > Thanks, > Xuelei This pull request has now been integrated. Changeset: b778cd52 Author: Xue-Lei Andrew Fan <xue...@openjdk.org> URL: https://git.openjdk.org/jdk/commit/b778cd52b3fae013ecb21d90bcf940a4d947bd68 Stats: 141 lines in 8 files changed: 84 ins; 34 del; 23 mod 8295010: Reduce if required in EC limbs operations Reviewed-by: djelinski, jjiang ------------- PR: https://git.openjdk.org/jdk/pull/10624