On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski <d...@openjdk.org> wrote:

>> This fix recovers XDH performance but removes some of the P256 gains 
>> (~-8-14%). Still faster, but not as much.
>> 
>> The fix is to undo 'int' return type on mult()/square(), which allowed to 
>> return partially reduced result (e.g. this avoids extra reductions when 
>> mult() result is fed into addition). This is the behaviour before the 
>> Montgomery ECC PR.
>> 
>> ---
>> XDH.generateSecret performance 
>> before Montgomery PR:
>> 
>> Benchmark                             (algorithm)  (keyLength)  
>> (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
>> KeyAgreementBench.XDH.generateSecret          XDH          255             
>> XDH              thrpt    3  8435.277 ± 27.230  ops/s
>> 
>> after Montgomery PR:
>> 
>> Benchmark                             (algorithm)  (keyLength)  
>> (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
>> KeyAgreementBench.XDH.generateSecret          XDH          255             
>> XDH              thrpt    3  8309.028 ± 22.071  ops/s
>> 
>> with this PR:
>> 
>> Benchmark                             (algorithm)  (keyLength)  
>> (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
>> KeyAgreementBench.XDH.generateSecret          XDH          255             
>> XDH              thrpt    3  8491.268 ± 32.858  ops/s
>> 
>> ---
>> 
>> P256 performance with/without mult intrinsic:
>> 
>> Performance before Montgomery PR:
>> 
>> Benchmark                        (algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt     Score    Error  Units
>> SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256        
>>       thrpt    3  6398.727 ±  7.400  ops/s
>> SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256        
>>       thrpt    3  6129.739 ±  5.995  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256        
>>       thrpt    3  1889.928 ± 54.660  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256        
>>       thrpt    3  1866.339 ± 42.438  ops/s
>> Benchmark                                            (algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          
>> 256              EC              thrpt    3  1350.745 ± 28.514  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          
>> 256              EC              thrpt    3  1349.393 ± 32.050  ops/s
>> 
>> Performance in master without mult() intrinsic
>> 
>> Benchmark                        ...
>
> Volodymyr Paprotski has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   comment from Sandhya

Looks good to me. It would be good, though, to figure out what else could be 
done to regain the P256 performance with keeping the speed of this code path.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2189545307

Reply via email to