On Wed, 17 Jun 2026 00:02:25 GMT, Shawn Emery <[email protected]> wrote:
>> Curve25519 polynomial arithmetic is performed with intrinsincs implemented >> in GPR related instructions for multiplication operations (method mult()). >> Benchmark improvements include: >> >> X25519 decapsulation: +9% >> X25519 encapsulation: +9% >> X22519 key agreement: +7% >> X25519 key-pair generation: +10% >> X25519-MLKEM decapsulation: +7% >> X25519-MLKEM encapsulation: +8% >> X25519-MLKEM key-pair generation: +8% >> EdDSA sign: +12% >> EdDSA verify: +12% >> EdDSA key-pair generation: +15% >> >> Note 1: The difference between Aarch64 vs. x86_64 intrinsics implementation >> include the lack of square() intrinsics; usage caused a 3.3% performance >> regression due to the efficiencies of the symmetric squaring shape in Java >> vs. the inefficiencies of the leaf calls and the additional cycles required >> for 64 bit multiplication in Aarch64. >> Note 2: The GPR related instructions were optimal when compared to hybrid >> (GPR related instructions for the first two iterations and Neon instructions >> for the last two iterations) solution. This design produced a -4%/-1% >> performance drop in KEM decapsulation/encapsulation compared to the GPR >> related instructions where the overhead of performing the limb splits and >> reconstruction did not compensate enough for the efficiencies of SIMD >> parallelism. >> >> --------- >> - [X] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Shawn Emery has updated the pull request incrementally with one additional > commit since the last revision: > > Update based on adinn's comments Please merge from master to get Windows AArch64 tested. I suspect it will show up the following issues: Ah, Windows AArch64 already _does_ show the problem I noted above: c:\a\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(7721): error C2220: the following warning is treated as an error c:\a\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(7721): warning C4293: '<<': shift count negative or too big, undefined behavior src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7677: > 7675: > 7676: /** > 7677: * Arithmetic polynomial multiplicaiton in Curve25519. The algorithm > mimics Suggestion: * Arithmetic polynomial multiplication in Curve25519. The algorithm mimics src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7721: > 7719: const int32_t columns = limbs * 2; > 7720: const uint64_t mask = -1UL >> rem; > 7721: const uint64_t CARRY_ADD = 1UL << (bpl - 1); On MSVC, `UL` is `unsigned long`, which is apparently only 32-bit long. So shifting `<< 51` breaks. ------------- PR Review: https://git.openjdk.org/jdk/pull/31409#pullrequestreview-4541898537 PR Comment: https://git.openjdk.org/jdk/pull/31409#issuecomment-4765874405 PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3450421727 PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3450419179
