On Wed, 23 Nov 2022 06:21:05 GMT, Xue-Lei Andrew Fan <xue...@openjdk.org> wrote:
>> Hi, >> >> May I have this update reviewed? With this update, the result will be >> reduced if required in EC limbs operations in the JDK implementation. >> >> In the current implementation, the EC limbs addition and subtraction result >> is not reduced before the value is returned. This behavior is different >> from multiplication and square. To get it right, a maximum addition limit >> is introduced for EC limbs addition and subtraction. If the limit is >> reached, an exception will be thrown so as to indicate an EC implementation >> problem. The exception is not recoverable from the viewpoint of an >> application. >> >> The design is error prone as the EC implementation code must be carefully >> checked so that the limit cannot reach. If a reach is noticed, a reduce >> operation would have to be hard-coded additionally. In the FieldGen.java >> implementation, the limit can only be 1 or 2 as the implementation code is >> only able to handle 2. But the FieldGen.java does not really check if 1 or >> 2 really works for the specific filed generation parameters. >> >> The design impact the performance as well. Because of this limit, the >> maximum limb size is 28 bits for 2 max adding limit. Otherwise there are >> integer (long) overflow issues. For example for 256 bits curves, 10 limbs >> are required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing >> 1 limbs from 10 to 9, the Signature performance could get improved by 20%. >> >> In the IntegerPolynomial class description, it is said "All >> IntegerPolynomial implementations allow at most one addition before >> multiplication. Additions after that will result in an ArithmeticException." >> It's too strict to follow without exam the code very carefully. Indeed, the >> implementation does not really follow the spec, and 2 addition may be >> allowed. >> >> It would be nice if there is no addition limit, and then we are free from >> these issues. It is doable by reducing the limbs if required for EC limbs >> operations. >> >> The benchmark for ECDSA Signature is checked in order to see how much the >> performance could be impacted. Here are the benchmark numbers before the >> patch applied: >> >> Benchmark (curveName) (messageLength) Mode Cnt Score Error >> Units >> Signatures.sign secp256r1 64 thrpt 15 1417.507 ± 5.809 >> ops/s >> Signatures.sign secp256r1 512 thrpt 15 1412.620 ± 7.040 >> ops/s >> Signatures.sign secp256r1 2048 thrpt 15 1417.364 ± 6.643 >> ops/s >> Signatures.sign secp256r1 16384 thrpt 15 1364.720 ± 44.354 >> ops/s >> Signatures.sign secp384r1 64 thrpt 15 609.924 ± 9.858 >> ops/s >> Signatures.sign secp384r1 512 thrpt 15 610.873 ± 5.519 >> ops/s >> Signatures.sign secp384r1 2048 thrpt 15 608.573 ± 13.324 >> ops/s >> Signatures.sign secp384r1 16384 thrpt 15 597.802 ± 7.074 >> ops/s >> Signatures.sign secp521r1 64 thrpt 15 301.954 ± 6.449 >> ops/s >> Signatures.sign secp521r1 512 thrpt 15 300.774 ± 5.849 >> ops/s >> Signatures.sign secp521r1 2048 thrpt 15 295.831 ± 8.423 >> ops/s >> Signatures.sign secp521r1 16384 thrpt 15 276.258 ± 43.146 >> ops/s >> Signatures.sign Ed25519 64 thrpt 15 1171.878 ± 4.187 >> ops/s >> Signatures.sign Ed25519 512 thrpt 15 1175.175 ± 4.611 >> ops/s >> Signatures.sign Ed25519 2048 thrpt 15 1168.459 ± 5.750 >> ops/s >> Signatures.sign Ed25519 16384 thrpt 15 1086.906 ± 6.514 >> ops/s >> Signatures.sign Ed448 64 thrpt 15 326.475 ± 3.274 >> ops/s >> Signatures.sign Ed448 512 thrpt 15 328.709 ± 3.867 >> ops/s >> Signatures.sign Ed448 2048 thrpt 15 315.480 ± 15.377 >> ops/s >> Signatures.sign Ed448 16384 thrpt 15 312.388 ± 1.067 >> ops/s >> >> >> and here are the numbers with this patch applied: >> >> Benchmark (curveName) (messageLength) Mode Cnt Score Error >> Units >> Signatures.sign secp256r1 64 thrpt 15 1377.447 ± 38.889 >> ops/s >> Signatures.sign secp256r1 512 thrpt 15 1403.149 ± 5.151 >> ops/s >> Signatures.sign secp256r1 2048 thrpt 15 1401.101 ± 6.937 >> ops/s >> Signatures.sign secp256r1 16384 thrpt 15 1381.855 ± 10.606 >> ops/s >> Signatures.sign secp384r1 64 thrpt 15 595.979 ± 1.967 >> ops/s >> Signatures.sign secp384r1 512 thrpt 15 592.419 ± 6.087 >> ops/s >> Signatures.sign secp384r1 2048 thrpt 15 581.800 ± 18.815 >> ops/s >> Signatures.sign secp384r1 16384 thrpt 15 583.224 ± 9.856 >> ops/s >> Signatures.sign secp521r1 64 thrpt 15 264.772 ± 36.883 >> ops/s >> Signatures.sign secp521r1 512 thrpt 15 302.084 ± 1.074 >> ops/s >> Signatures.sign secp521r1 2048 thrpt 15 296.619 ± 3.272 >> ops/s >> Signatures.sign secp521r1 16384 thrpt 15 290.112 ± 5.085 >> ops/s >> Signatures.sign Ed25519 64 thrpt 15 1163.293 ± 4.266 >> ops/s >> Signatures.sign Ed25519 512 thrpt 15 1157.093 ± 8.145 >> ops/s >> Signatures.sign Ed25519 2048 thrpt 15 1140.900 ± 8.660 >> ops/s >> Signatures.sign Ed25519 16384 thrpt 15 1053.233 ± 40.591 >> ops/s >> Signatures.sign Ed448 64 thrpt 15 317.509 ± 6.870 >> ops/s >> Signatures.sign Ed448 512 thrpt 15 320.975 ± 1.534 >> ops/s >> Signatures.sign Ed448 2048 thrpt 15 322.157 ± 1.914 >> ops/s >> Signatures.sign Ed448 16384 thrpt 15 303.532 ± 6.419 >> ops/s >> >> >> From the numbers, it looks like the performance impact is limited. Based on >> this update, the performance could get rewarded by using less limbs. For >> examples if using less limbs for secp256r1/secp384r1/secp521r1/curve25519, >> [see PR for PR for JDK-8294248](https://github.com/openjdk/jdk/pull/10398), >> and run the benchmark together with this patch, I could see the performance >> improvement as follow (note: No update for Ed448 as using less limbs does >> not improve the performance for Ed448): >> >> Benchmark (curveName) (messageLength) Mode Cnt Score Error >> Units >> Signatures.sign secp256r1 64 thrpt 15 1769.262 ± 4.251 >> ops/s >> Signatures.sign secp256r1 512 thrpt 15 1764.754 ± 12.185 >> ops/s >> Signatures.sign secp256r1 2048 thrpt 15 1737.499 ± 43.937 >> ops/s >> Signatures.sign secp256r1 16384 thrpt 15 1733.932 ± 7.938 >> ops/s >> Signatures.sign secp384r1 64 thrpt 15 636.614 ± 6.752 >> ops/s >> Signatures.sign secp384r1 512 thrpt 15 629.449 ± 8.000 >> ops/s >> Signatures.sign secp384r1 2048 thrpt 15 645.898 ± 5.655 >> ops/s >> Signatures.sign secp384r1 16384 thrpt 15 634.530 ± 11.846 >> ops/s >> Signatures.sign secp521r1 64 thrpt 15 334.413 ± 6.710 >> ops/s >> Signatures.sign secp521r1 512 thrpt 15 335.088 ± 6.630 >> ops/s >> Signatures.sign secp521r1 2048 thrpt 15 339.729 ± 1.450 >> ops/s >> Signatures.sign secp521r1 16384 thrpt 15 327.597 ± 2.341 >> ops/s >> Signatures.sign Ed25519 64 thrpt 15 1361.028 ± 2.338 >> ops/s >> Signatures.sign Ed25519 512 thrpt 15 1359.072 ± 3.748 >> ops/s >> Signatures.sign Ed25519 2048 thrpt 15 1347.409 ± 9.172 >> ops/s >> Signatures.sign Ed25519 16384 thrpt 15 1250.794 ± 11.750 >> ops/s >> Signatures.sign Ed448 64 thrpt 15 321.312 ± 5.826 >> ops/s >> Signatures.sign Ed448 512 thrpt 15 326.592 ± 4.213 >> ops/s >> Signatures.sign Ed448 2048 thrpt 15 322.183 ± 7.005 >> ops/s >> Signatures.sign Ed448 16384 thrpt 15 299.869 ± 16.594 >> ops/s >> ``` >> >> If considering this PR together with [PR for >> JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance >> improvement could be expected for EC crypto primitives. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request with a new target base due to > a merge or a rebase. The pull request now contains eight commits: > > - Merge > - remove duplicated bench test > - Merge master > - missed reduce > - reduce if needed > - add the key pair generation bench code > - remove tabs > - 8295010: EC limbs addition and subtraction limit Marked as reviewed by jjiang (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10624