On Tue, 23 Jun 2026 10:26:51 GMT, Aleksey Shipilev <[email protected]> wrote:
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 12926:
>>
>>> 12924:
>>> 12925: if (UseIntPoly25519Intrinsics) {
>>> 12926: StubRoutines::_intpoly_mult_25519 =
>>> generate_intpoly_mult_25519();
>>
>> I am looking at x86 code for this, and that architecture implements both
>> _mult_ and _square_ intrinsics. First of all, this is inconsistent. But
>> second, are we leaving the actual performance on the table here?
>
> Ah, I am seeing the note in PR: "Note 1: The difference between Aarch64 vs.
> x86_64 intrinsics implementation include the lack of square() intrinsics;
> usage caused a 3.3% performance regression due to the efficiencies of the
> symmetric squaring shape in Java vs. the inefficiencies of the leaf calls and
> the additional cycles required for 64 bit multiplication in Aarch64."
>
> All right. But at least document it right in this hunk, so that it is clear
> the square omission is intentional.
Done.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3463518217