On Sat, 20 Apr 2024 14:14:59 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Long to short jmp; other cleanup
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530:
> 
>> 2528:     switch (type) {
>> 2529:       case USM_SHORT:
>> 2530:         __ movw(Address(dest, (2 * i)), wide_value);
> 
> MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 
> bit stores, any specific reason for keeping same unroll factor for all the 
> stores.

My understanding is the spec requires the appropriate-sized write based on 
alignment and size.  This is why there's no 128-bit or 256-bit store loops.

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539:
> 
>> 2537:         break;
>> 2538:     }
>> 2539:   }
> 
> I understand we want to be as accurate as possible in filling the tail in an 
> event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes 
> if destination segment was quadword aligned, aligned quadword stores are 
> implicitly atomic on x86 targets, what's your thoughts on using a vector 
> instruction based loop.

I believe the spec is specific on the size of the store required given 
alignment and size.  I want to honor that spec even though wider stores could 
be done in many cases.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573373720
PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573374108

Reply via email to