On Tue, 19 May 2026 08:27:53 GMT, Andrew Haley <[email protected]> wrote:
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7758: >> >>> 7756: __ lsr(tmp, lo, montMulP256Shift2); >>> 7757: __ orr(hi, hi, tmp); >>> 7758: __ andr(lo, lo, mask); >> >> Suggestion: >> >> // compute 104-bit (40 + 64) full product >> __ umulh(hi, a, b); >> __ mul(lo, a, b); >> // combine 40 + 12 bits into hi result >> __ lsl(hi, hi, montMulP256Shift1); >> __ lsr(tmp, lo, montMulP256Shift2); >> __ orr(hi, hi, tmp); >> // mask off 52 bits of lo result >> __ andr(lo, lo, mask); > > It might be better and clearer to use `bfm` rather that shifting, masking, > and ORing. Added the comments, but as for clarity of bfm, it is one less instruction, but to me it is not as intuitive as the shift and or. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288539803
