On Sun, 1 Mar 2026 18:17:09 GMT, Shawn Emery <[email protected]> wrote:
>> This implementation changes the limb size of X25519 from 10 x 26 bits to 5 x
>> 51 bits in order to take advantage of performance gains from a reduction in
>> the number of limb operations.
>>
>> Performance gains were observed from the key generation/agreement,
>> encapsulation, and decapsulation benchmarks for both aarch64 with 3 - 8%
>> gains and x86_64 with 9% gains.
>>
>> Thank you @ferakocz for their help in working through the early stages of
>> this code with me.
>
> Shawn Emery has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Remove IntrinsicCandidate for mult() method
It would be nice if you could add comments to the methods about their
requirements for the inputs and what they produce as a result (even in the
ancestor classes).
src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
line 39:
> 37: private static final long LIMB_MASK = -1L >>> (64 - BITS_PER_LIMB);
> 38:
> 39: private static final long[] one = new long[] {
If the method get1() is not overridden, this is unnecessary.
src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
line 42:
> 40: 0x0000000000000001L, 0x0000000000000000L, 0x0000000000000000L,
> 41: 0x0000000000000000L, 0x0000000000000000L };
> 42: private static final long[] zero = new long[] {
If the method get0() is not overridden, this is unnecessary.
src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
line 60:
> 58:
> 59: @Override
> 60: public ImmutableElement get0() {
Is it necessary to override this method?
src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
line 65:
> 63:
> 64: @Override
> 65: public ImmutableElement get1() {
Is it necessary to override this method?
src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
line 83:
> 81: // Overriden for performance (unroll and unnesting)
> 82: @Override
> 83: protected void carry(long[] limbs) {
Have you tried working with all positive limbs (as in
IntegerPolynomial1305.java)?
src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
line 327:
> 325:
> 326: @Override
> 327: protected void square(long[] a, long[] r) {
I suggest changing the square() method to a more efficient one by computing the
a_i * a_j (i != j) products only once and addig the result twice to the
appropriate places in the result. Something like this (not tested):
@Override
protected void square(long[] a, long[] r) {
long aa0 = a[0];
long aa1 = a[1];
long aa2 = a[2];
long aa3 = a[3];
long aa4 = a[4];
final long shift1 = 64 - BITS_PER_LIMB;
final long shift2 = BITS_PER_LIMB;
long d0, d1, d2, d3, d4; // low digits from multiplication
long dd0, dd1, dd2, dd3, dd4; // high digits from multiplication
// multiplication result digits for each column
long c0, c1, c2, c3, c4, c5, c6, c7, c8, c9;
// Row 0 - multiply by aa0
d0 = aa0 * aa0;
dd0 = Math.multiplyHigh(aa0, aa0) << shift1 | (d0 >>> shift2);
d0 &= LIMB_MASK;
d1 = aa0 * aa1;
dd1 = Math.multiplyHigh(aa0, bb1) << shift1 | (d1 >>> shift2);
d1 &= LIMB_MASK;
d2 = aa0 * aa2;
dd2 = Math.multiplyHigh(aa0, bb2) << shift1 | (d2 >>> shift2);
d2 &= LIMB_MASK;
d3 = aa0 * aa3;
dd3 = Math.multiplyHigh(aa0, bb3) << shift1 | (d3 >>> shift2);
d3 &= LIMB_MASK;
d4 = aa0 * aa4;
dd4 = Math.multiplyHigh(aa0, bb4) << shift1 | (d4 >>> shift2);
d4 &= LIMB_MASK;
c0 = d0;
c1 = (d1 << 1) + dd0;
c2 = (d2 + dd1) << 1;
c3 = (d3 + dd2) << 1;
c4 = (d4 + dd3) << 1;
c5 = dd4 << 1;
// Row 1 - multiply by aa1
d1 = aa1 * aa1;
dd1 = Math.multiplyHigh(aa1, bb1) << shift1 | (d1 >>> shift2);
d1 &= LIMB_MASK;
d2 = aa1 * aa2;
dd2 = Math.multiplyHigh(aa1, bb2) << shift1 | (d2 >>> shift2);
d2 &= LIMB_MASK;
d3 = aa1 * aa3;
dd3 = Math.multiplyHigh(aa1, bb3) << shift1 | (d3 >>> shift2);
d3 &= LIMB_MASK;
d4 = aa1 * aa4;
dd4 = Math.multiplyHigh(aa1, bb4) << shift1 | (d4 >>> shift2);
d4 &= LIMB_MASK;
c2 += d1;
c3 += (d2 << 1) + dd1;
c4 += (d3 + dd2) << 1;
c5 += (d4 + dd3) << 1;
c6 = dd4 << 1;
// Row 2 - multiply by aa2
d2 = aa2 * aa2;
dd2 = Math.multiplyHigh(aa2, bb2) << shift1 | (d2 >>> shift2);
d2 &= LIMB_MASK;
d3 = aa2 * aa3;
dd3 = Math.multiplyHigh(aa2, bb3) << shift1 | (d3 >>> shift2);
d3 &= LIMB_MASK;
d4 = aa2 * aa4;
dd4 = Math.multiplyHigh(aa2, bb4) << shift1 | (d4 >>> shift2);
d4 &= LIMB_MASK;
c4 += d2;
c5 += (d3 << 1) + dd2;
c6 += (d4 + dd3) << 1;
c7 = dd4 << 1;
// Row 3 - multiply by aa3
d3 = aa3 * aa3;
dd3 = Math.multiplyHigh(aa3, bb3) << shift1 | (d3 >>> shift2);
d3 &= LIMB_MASK;
d4 = aa3 * aa4;
dd4 = Math.multiplyHigh(aa3, bb4) << shift1 | (d4 >>> shift2);
d4 &= LIMB_MASK;
c6 += d3;
c7 += (d4 << 1) + dd3;
c8 = dd4 << 1;
// Row 4 - multiply by aa4
d4 = aa4 * aa4;
dd4 = Math.multiplyHigh(aa4, bb4) << shift1 | (d4 >>> shift2);
d4 &= LIMB_MASK;
c8 += d4;
c9 = dd4;
// Perform pseudo-Mersenne reduction
r[0] = c0 + (19 * c5);
r[1] = c1 + (19 * c6);
r[2] = c2 + (19 * c7);
r[3] = c3 + (19 * c8);
r[4] = c4 + (19 * c9);
reduce(r);
}
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29981#issuecomment-3984615568
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2872567722
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2872568897
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2872390257
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2872391902
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2872333415
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2872319370