On Sat, 20 Apr 2024 14:14:59 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Long to short jmp; other cleanup > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530: > >> 2528: switch (type) { >> 2529: case USM_SHORT: >> 2530: __ movw(Address(dest, (2 * i)), wide_value); > > MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 > bit stores, any specific reason for keeping same unroll factor for all the > stores. My understanding is the spec requires the appropriate-sized write based on alignment and size. This is why there's no 128-bit or 256-bit store loops. > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539: > >> 2537: break; >> 2538: } >> 2539: } > > I understand we want to be as accurate as possible in filling the tail in an > event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes > if destination segment was quadword aligned, aligned quadword stores are > implicitly atomic on x86 targets, what's your thoughts on using a vector > instruction based loop. I believe the spec is specific on the size of the store required given alignment and size. I want to honor that spec even though wider stores could be done in many cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573373720 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573374108