On Mon, 18 May 2026 15:06:41 GMT, Andrew Dinn <[email protected]> wrote:
>> Ferenc Rakoczi has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Accepting more suggestions from Andrew Dinn.
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7869:
>
>> 7867: __ ldr(a_i, __ post(a, 8));
>> 7868: gpr_partial_mult_52(a_i, b_0, high, low, tmp, limb_mask);
>> 7869: __ andr(n, low, limb_mask);
>
> It would be clearer to use `mov(n, low)` here or, at the very least `orr(n,
> zr, low)`. `low` has already been masked with `limb_mask`, which makes the
> `andr` look redundant when, in fact, the point is to save the initial value
> computed for `low`.
Accepted.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7873:
>
>> 7871: neon_partial_mult_64(B, b_highs, a_vals, 0);
>> 7872:
>> 7873: // Limb 0 cont
>
> Suggestion:
>
> // Limb 0 modulus computation
> // n.b. modulus computation requires multiplying successive
> // limbs of the product by corresponding limbs of the p256
> // prime adding the result to the limb and folding this
> // partial result into a running 256-bit sum in c_i. Limbs
> // of c_i are stored via c_ptr once carries are included.
> // n.b. the mul + add is omitted for limb 2 since the
> // corresponding prime bits are zero.
Accepted.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7899:
>
>> 7897: gpr_partial_mult_52(a_i, b_2, high, low, tmp, limb_mask);
>> 7898: __ add(c_i, c_i, low);
>> 7899: __ str(c_i, Address(c_ptr, 8));
>
> I don't really like the hard-coded numbers in these `c_ptr` address
> expressions. At the very least it would be better if they were be specified
> using a const expression like `nnn * BytesPerLong` to show that we are
> indexing an array of 64 bit data at index `nnn`.
>
> A better idea would be to define a static helper method that also checks the
> index is in range
>
>
> #define C_DATA_COUNT 5
> static int c_slot_offset(index i) {
> assert(0 <= i && i < C_SLOT_COUNT, "invalid index for c_i data");
> return i * BytesPerLong;
> }
>
>
> You can then define the address as `Address(c_ptr, c_offset(1))` etc.
>
> Also, once you have defined `C_DATA_COUNT` you can also define a constant
> C_DATA_SIZE for use when you extend the stack:
>
>
> #define C_DATA_SIZE align_up(C_DATA_COUNT, 2) * BytesPerLong;
>
> . . .
> __ sub(sp, sp, C_DATA_SIZE);
> . . .
>
>
> The same consideration applies with even more force as regards the hard-wired
> offsets used when accessing the Neon multiplication product data via `sp`.
>
>
> __ ldr(low_1, Address(sp));
> __ ldr(high_1, Address(sp, 16));
>
>
> This also allows you to fold the details of the data layout into the function:
>
>
> #define MUL_DATA_COUNT 8
> static int mul_data_offset(index i, boolean want_high) {
> assert(0 <= i && i < MUL_DATA_COUNT, "invalid index for mul data");
> // data is stored as successive low/high pairs
> // (l0, l1), (h0, h1), ... (l6, l7), (h6, h7)
> int pair_start = (i >> 1) * 4;
> int high_select = (want_high ? 2 : 0)
> int pair_element = (i & 1);
> int idx = pair_start + high_select + pair_element;
> return (idx * BytesPerLong);
> }
>
> . . .
> __ ldr(low_1, Address(sp, mul_data_offset(0, false));
> __ ldr(high_1, Address(sp, mul_data_offset(0, true)));
> . . .
>
>
> Likewise you can define a constant MUL_DATA_SIZE for use when you extend the
> stack:
>
>
> #define MUL_DATA_SIZE (MUL_DATA_COUNT * 2) * BytesPerLong;
>
> . . .
> __ sub(sp, sp, MUL_DATA_SIZE);
> . . .
I think this would be over-engineering. We are talking about fixed length
arrays here, k * BytesPerLong clearly enough indicates the index k in my
opinion (even with "precomputing" for indicies 0 and 1).
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8077:
>
>> 8075: + b_0 + b_1 + b_2;
>> 8076: b_0 = b_1 = b_2 = noreg;
>> 8077:
>
> This freeing of registers would be better done at line 8002 (i.e. before
> processing a3) and with register b_3 also freed. It highlights to the
> maintainer that b0 - b3 are no longer needed from that point on.
Done.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8267:
>
>> 8265: // P521OrderField: 19 = 8 + 8 + 2 + 1
>> 8266: // Special Cases 5, 10, 14, 16, 19
>> 8267:
>
> You need to insert the standard call `load_archive_data` here and return any
> stub that is found.
Done.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8377:
>
>> 8375: VSeq<4> b_vec(20);
>> 8376:
>> 8377: __ ldr(a9, Address(aLimbs, 64));
>
> It makes things more obvious to a maintainer and avoids mistakes should
> anything need patching if you use 8 * BytesPerLong here instead of 64 and
> likewise for all the other hard-wired Address offsets used throughout this
> method. That way the offsets directly relate to the vector and GPR counts
> mentioned in the comments.
Accepted.
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8568:
>
>> 8566: __ mov(r0, zr); // return 0
>> 8567: __ ret(lr);
>> 8568: return start;
>
> You need to call `store_archive_data` here.
Done.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288540269
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288541160
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288542197
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288542627
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288556389
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288557945
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288556748