Hi,

May I have this update reviewed?  With this update, the result will be always 
reduced in EC limbs addition and subtraction operations in the JDK 
implementation.

In the current implementation, the EC limbs addition and subtraction result is 
not reduced before the value is returned.  This behavior is different from 
multiplication and square.  To get it right, a maximum addition limit is 
introduced for EC limbs addition and subtraction.  If the limit is reached, an 
exception will be thrown so as to indicate an EC implementation problem.  The 
exception is not recoverable from the viewpoint of an application.

The design is error prone as the EC implementation code must be carefully 
checked so that the limit cannot reach. If a reach is noticed, a reduce 
operation would have to be hard-coded additionally. In the FieldGen.java 
implementation, the limit can only be 1 or 2 as the implementation code is only 
able to handle 2.  But the FieldGen.java does not really check if 1 or 2 really 
works for the specific filed generation parameters.

The design impact the performance as well. Because of this limit, the maximum 
limb size is 28 bits for 2 max adding limit. Otherwise there are integer (long) 
overflow issues. For example for 256 bits curves, 10 limbs are required for 
28-bit limb size; and 9 limbs for 29-bit size. By reducing 1 limbs from 10 to 
9, the Signature performance could get improved by 20%.

In the IntegerPolynomial class description, it is said "All IntegerPolynomial 
implementations allow at most one addition before multiplication. Additions 
after that will result in an ArithmeticException." It's too strict to follow 
without exam the code very carefully. Indeed, the implementation does not 
really follow the spec, and 2 addition may be allowed.

It would be nice if there is no addition limit, and then we are free from these 
issues. It is doable by reducing the limbs in the adding and subtraction 
operations, as other operations as multiplication. And then the code related to 
the limit is not necessary any longer. The code can do any many additions as 
required.

The benchmark for ECDSA Signature is checked in order to see how much the 
performance could be impacted. Here are the benchmark numbers before the patch 
applied:

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1414.463 ±  9.616  ops/s
Signatures.sign              512  thrpt   15  1409.023 ± 14.636  ops/s
Signatures.sign             2048  thrpt   15  1414.488 ±  4.728  ops/s
Signatures.sign            16384  thrpt   15  1385.685 ±  4.476  ops/s


and here are the numbers with this patch applied:

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1293.658 ±  8.358  ops/s
Signatures.sign              512  thrpt   15  1298.404 ±  4.344  ops/s
Signatures.sign             2048  thrpt   15  1289.732 ± 19.748  ops/s
Signatures.sign            16384  thrpt   15  1278.980 ± 13.483  ops/s


>From the numbers, it looks like the performance impact is about 10%.  However, 
>the performance impact could get rewarded by using less limbs.  For examples 
>for secp256r1, I updated the limbs from [10 to 9 for integer 
>operations](https://github.com/openjdk/jdk/pull/10398), and run the benchmark 
>together with this patch, I could see the performance improvement as follow:

Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
Signatures.sign               64  thrpt   15  1578.568 ± 11.000  ops/s
Signatures.sign              512  thrpt   15  1596.058 ±  4.184  ops/s
Signatures.sign             2048  thrpt   15  1591.519 ±  6.444  ops/s
Signatures.sign            16384  thrpt   15  1563.480 ±  6.837  ops/s
``` 

The following are numbers for secp256r1 key generation, without this patch:

Benchmark                      Mode  Cnt     Score   Error  Units
KeyPairGenerators.keyPairGen  thrpt   15  1531.026 ± 9.869  ops/s


With this patch:

Benchmark                      Mode  Cnt     Score    Error  Units
KeyPairGenerators.keyPairGen  thrpt   15  1367.162 ± 50.322  ops/s


with improved limbs (from 10 to 9):

Benchmark                      Mode  Cnt     Score    Error  Units
KeyPairGenerators.keyPairGen  thrpt   15  1713.097 ± 8.003  ops/s

If considering this PR together with [PR for 
JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance 
improvement could be expected for EC crypto primitives.

Thanks,
Xuelei

-------------

Commit messages:
 - add the key pair generation bench code
 - remove tabs
 - 8295010: EC limbs addition and subtraction limit

Changes: https://git.openjdk.org/jdk/pull/10624/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10624&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8295010
  Stats: 256 lines in 13 files changed: 128 ins; 94 del; 34 mod
  Patch: https://git.openjdk.org/jdk/pull/10624.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10624/head:pull/10624

PR: https://git.openjdk.org/jdk/pull/10624

Reply via email to