On Tue, 23 Jun 2026 10:26:51 GMT, Aleksey Shipilev <[email protected]> wrote:

>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 12926:
>> 
>>> 12924: 
>>> 12925:     if (UseIntPoly25519Intrinsics) {
>>> 12926:       StubRoutines::_intpoly_mult_25519 = 
>>> generate_intpoly_mult_25519();
>> 
>> I am looking at x86 code for this, and that architecture implements both 
>> _mult_ and _square_ intrinsics. First of all, this is inconsistent. But 
>> second, are we leaving the actual performance on the table here?
>
> Ah, I am seeing the note in PR: "Note 1: The difference between Aarch64 vs. 
> x86_64 intrinsics implementation include the lack of square() intrinsics; 
> usage caused a 3.3% performance regression due to the efficiencies of the 
> symmetric squaring shape in Java vs. the inefficiencies of the leaf calls and 
> the additional cycles required for 64 bit multiplication in Aarch64."
> 
> All right. But at least document it right in this hunk, so that it is clear 
> the square omission is intentional.

Done.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3463518217

Reply via email to