Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

Jatin Bhateja Sat, 20 Apr 2024 07:26:43 -0700

On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:


>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Long to short jmp; other cleanup

src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530:

> 2528:     switch (type) {
> 2529:       case USM_SHORT:
> 2530:         __ movw(Address(dest, (2 * i)), wide_value);

MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit 
stores, any specific reason for keeping same unroll factor for all the stores.

src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539:

> 2537:         break;
> 2538:     }
> 2539:   }

I understand we want to be as accurate as possible in filling the tail in an 
event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if 
destination segment was quadword aligned, aligned quadword stores are 
implicitly atomic on x86 targets, what's your thoughts on using a vector 
instruction based loop.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573297441
PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573299069

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

Reply via email to