Re: RFR: 8378893: X25519 should utilize a larger limb size [v2]

Shawn Emery Wed, 04 Mar 2026 08:08:44 -0800

On Mon, 2 Mar 2026 14:05:34 GMT, Ferenc Rakoczi <[email protected]> wrote:


> It would be nice if you could add comments to the methods about their 
> requirements for the inputs and what they produce as a result (even in the 
> ancestor classes).

I've taken a first round at this, let me know if I understood you correctly.

> src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
>  line 39:
> 
>> 37:     private static final long LIMB_MASK = -1L >>> (64 - BITS_PER_LIMB);
>> 38: 
>> 39:     private static final long[] one = new long[] {
> 
> If the method get1() is not overridden, this is unnecessary.

Done.

> src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
>  line 42:
> 
>> 40:         0x0000000000000001L, 0x0000000000000000L, 0x0000000000000000L,
>> 41:         0x0000000000000000L, 0x0000000000000000L };
>> 42:     private static final long[] zero = new long[] {
> 
> If the method get0() is not overridden, this is unnecessary.

Done.

> src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
>  line 60:
> 
>> 58: 
>> 59:     @Override
>> 60:     public ImmutableElement get0() {
> 
> Is it necessary to override this method?

Good catch, removed!

> src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
>  line 65:
> 
>> 63: 
>> 64:     @Override
>> 65:     public ImmutableElement get1() {
> 
> Is it necessary to override this method?

Good catch, removed!

> src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
>  line 83:
> 
>> 81:     // Overriden for performance (unroll and unnesting)
>> 82:     @Override
>> 83:     protected void carry(long[] limbs) {
> 
> Have you tried working with all positive limbs (as in 
> IntegerPolynomial1305.java)?

I had initially, but had to Override a number of methods because the superclass 
assumes that the limbs are signed.  I assume to keep the limbs signed given 
that there is no difference in performance.

> src/java.base/share/classes/sun/security/util/math/intpoly/IntegerPolynomial25519.java
>  line 327:
> 
>> 325: 
>> 326:     @Override
>> 327:     protected void square(long[] a, long[] r) {
> 
> I suggest changing the square() method to a more efficient one by computing 
> the a_i * a_j (i != j) products only once and addig the result twice to the 
> appropriate places in the result. Something like this (not tested):
> 
> 
>     @Override
>     protected void square(long[] a, long[] r) {
>         long aa0 = a[0];
>         long aa1 = a[1];
>         long aa2 = a[2];
>         long aa3 = a[3];
>         long aa4 = a[4];
> 
>         final long shift1 = 64 - BITS_PER_LIMB;
>         final long shift2 = BITS_PER_LIMB;
> 
>         long d0, d1, d2, d3, d4;      // low digits from multiplication
>         long dd0, dd1, dd2, dd3, dd4; // high digits from multiplication
>         // multiplication result digits for each column
>         long c0, c1, c2, c3, c4, c5, c6, c7, c8, c9;
> 
>         // Row 0 - multiply by aa0
>         d0 = aa0 * aa0;
>         dd0 = Math.multiplyHigh(aa0, aa0) << shift1 | (d0 >>> shift2);
>         d0 &= LIMB_MASK;
> 
>         d1 = aa0 * aa1;
>         dd1 = Math.multiplyHigh(aa0, bb1) << shift1 | (d1 >>> shift2);
>         d1 &= LIMB_MASK;
> 
>         d2 = aa0 * aa2;
>         dd2 = Math.multiplyHigh(aa0, bb2) << shift1 | (d2 >>> shift2);
>         d2 &= LIMB_MASK;
> 
>         d3 = aa0 * aa3;
>         dd3 = Math.multiplyHigh(aa0, bb3) << shift1 | (d3 >>> shift2);
>         d3 &= LIMB_MASK;
> 
>         d4 = aa0 * aa4;
>         dd4 = Math.multiplyHigh(aa0, bb4) << shift1 | (d4 >>> shift2);
>         d4 &= LIMB_MASK;
> 
>         c0 = d0;
>         c1 = (d1 << 1) + dd0;
>         c2 = (d2 + dd1) << 1;
>         c3 = (d3 + dd2) << 1;
>         c4 = (d4 + dd3) << 1;
>         c5 = dd4 << 1;
> 
>         // Row 1 - multiply by aa1
>         d1 = aa1 * aa1;
>         dd1 = Math.multiplyHigh(aa1, bb1) << shift1 | (d1 >>> shift2);
>         d1 &= LIMB_MASK;
> 
>         d2 = aa1 * aa2;
>         dd2 = Math.multiplyHigh(aa1, bb2) << shift1 | (d2 >>> shift2);
>         d2 &= LIMB_MASK;
> 
>         d3 = aa1 * aa3;
>         dd3 = Math.multiplyHigh(aa1, bb3) << shift1 | (d3 >>> shift2);
>         d3 &= LIMB_MASK;
> 
>         d4 = aa1 * aa4;
>         dd4 = Math.multiplyHigh(aa1, bb4) << shift1 | (d4 >>> shift2);
>         d4 &= LIMB_MASK;
> 
>         c2 += d1;
>         c3 += (d2 << 1) + dd1;
>         c4 += (d3 + dd2) << 1;
>         c5 += (d4 + dd3) << 1;
>         c6 = dd4 << 1;
> 
>         // Row 2 - multiply by aa2
>         d2 = aa2 * aa2;
>         dd2 = Math.multiplyHigh(aa2, bb2) << s...

Good suggestion!  The square() optimization with a few tweaks increased 
performance by 17% for the key generation, encapsulation, and decapsulation.  
In combination of an increase in the maximum additions to one (which previously 
required one or two reductions per polynomial operation), the total increase 
performance from baseline is 41%!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29981#issuecomment-3996153219
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2882581207
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2882581624
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2882580448
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2882580734
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2882580108
PR Review Comment: https://git.openjdk.org/jdk/pull/29981#discussion_r2882579466

Re: RFR: 8378893: X25519 should utilize a larger limb size [v2]

Reply via email to