Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v11]

Andrew Dinn Tue, 23 Jun 2026 06:06:16 -0700

On Tue, 23 Jun 2026 12:17:30 GMT, Ferenc Rakoczi <[email protected]> wrote:


> I think it is rather unfortunate that this method was added to this 
> microbenchmark suite as its contribution to the run time of any real crypto 
> operation is minimal, so it makes almost no difference if it runs twice as 
> fast. However, it is important that it runs in constant time (i.e. its 
> running time is independent of the values in its input arrays and, more 
> importantly, whether the value of the "set" argument is 0 or 1). The java 
> code was written in such a way, but there is no guarantee that the compiler 
> will not change it back to using a branch instead of the xors if it can 
> figure out that only those 2 values are possible for "set". So the intrinsic 
> here is more for guaranteeing "set" value independent execution than for any 
> performance gains.

Yes, the assign benchmark is definitely of no use for measuring performance of 
the intrinsic relative to the Java code given that it hard-wires set in each 
call. I'm not sure it is even of much interest when we do have an intrinsic for 
measuring a difference between cases where `montBench == true` and `montBench 
== false`. Whatever the input polynomial type (P256 or MontgmomeryP256) the 
assign is going to exercise exactly the same intrinsic code.

Your point that the intrinsic runs in constant time is the best argument for 
keeping it. So, I'm happy to push this as is. The same consideration would 
apply for x86.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30941#issuecomment-4779475235

Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v11]

Reply via email to