On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk <d...@openjdk.org> wrote:

> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 
> message blocks at a time. For more details, left a lot of comments in 
> `macroAssembler_x86_poly.cpp`.
> 
> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and 
> java.
>   - Would like to add an `InvalidKeyException` in `Poly1305.java` (see 
> commented out block in that file), but that conflicts with the KAT. I do 
> think we should detect (R==0 || S ==0) so would like advice please.
> - Added a JMH perf test.
>    - JMH test had to use reflection (instead of existing `MacBench.java`), 
> since Poly1305 is not 'properly' registered with the provider.
> 
> Perf before:
> 
> Benchmark                   (dataSize)  (provider)   Mode  Cnt        Score   
>      Error  Units
> Poly1305DigestBench.digest          64              thrpt    8  2961300.661 ± 
> 110554.162  ops/s
> Poly1305DigestBench.digest         256              thrpt    8  1791912.962 ± 
>  86696.037  ops/s
> Poly1305DigestBench.digest        1024              thrpt    8   637413.054 ± 
>  14074.655  ops/s
> Poly1305DigestBench.digest       16384              thrpt    8    48762.991 ± 
>    390.921  ops/s
> Poly1305DigestBench.digest     1048576              thrpt    8      769.872 ± 
>      1.402  ops/s
> 
> and after:
> 
> Benchmark                   (dataSize)  (provider)   Mode  Cnt        Score   
>      Error  Units
> Poly1305DigestBench.digest          64              thrpt    8  2841243.668 ± 
> 154528.057  ops/s
> Poly1305DigestBench.digest         256              thrpt    8  1662003.873 ± 
>  95253.445  ops/s
> Poly1305DigestBench.digest        1024              thrpt    8  1770028.718 ± 
> 100847.766  ops/s
> Poly1305DigestBench.digest       16384              thrpt    8   765547.287 ± 
>  25883.825  ops/s
> Poly1305DigestBench.digest     1048576              thrpt    8    14508.458 ± 
>     56.147  ops/s

src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 262:

> 260:     private static void processMultipleBlocks(byte[] input, int offset, 
> int length, byte[] aBytes, byte[] rBytes) {
> 261:         MutableIntegerModuloP A = ipl1305.getElement(aBytes).mutable();
> 262:         MutableIntegerModuloP R = ipl1305.getElement(rBytes).mutable();

R doesn't need to be mutable.

src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 286:

> 284:      * numeric values.
> 285:      */
> 286:     private void setRSVals() { //throws InvalidKeyException {

The R and S check for invalid key (all bytes zero) could be submitted as a 
separate PR. 
It is not related to the Poly1305 acceleration.

test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305IntrinsicFuzzTest.java
 line 39:

> 37:         public static void main(String[] args) throws Exception {
> 38:                 //Note: it might be useful to increase this number during 
> development of new Poly1305 intrinsics
> 39:                 final int repeat = 100;

Should we increase this repeat count for the c2 compiler to kick in for 
compiling engineUpdate() and have the call to stub in place from there?

test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305KAT.java
 line 133:

> 131:             System.out.println("*** Test " + ++testNumber + ": " +
> 132:                     test.testName);
> 133:             if (runSingleTest(test)) {

runSingleTest may need to be called enough number of times for the engineUpdate 
to be compiled by c2.

-------------

PR: https://git.openjdk.org/jdk/pull/10582

Reply via email to