On Tue, 23 Jun 2026 10:24:31 GMT, Aleksey Shipilev <[email protected]> wrote:

>> Shawn Emery has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains five additional 
>> commits since the last revision:
>> 
>>  - Update based on shipilev's comments
>>  - Merge with mainline
>>  - Update based on adinn's comments
>>  - Merge with master branch
>>  - 8385304: X25519 should utilize aarch64 intrinsics
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 12926:
> 
>> 12924: 
>> 12925:     if (UseIntPoly25519Intrinsics) {
>> 12926:       StubRoutines::_intpoly_mult_25519 = 
>> generate_intpoly_mult_25519();
> 
> I am looking at x86 code for this, and that architecture implements both 
> _mult_ and _square_ intrinsics. First of all, this is inconsistent. But 
> second, are we leaving the actual performance on the table here?

Ah, I am seeing the note in PR: "Note 1: The difference between Aarch64 vs. 
x86_64 intrinsics implementation include the lack of square() intrinsics; usage 
caused a 3.3% performance regression due to the efficiencies of the symmetric 
squaring shape in Java vs. the inefficiencies of the leaf calls and the 
additional cycles required for 64 bit multiplication in Aarch64."

All right. But at least document it right in this hunk, so that it is clear the 
square omission is intentional.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31409#discussion_r3458920122

Reply via email to