On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall performance of >> `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and >> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Long to short jmp; other cleanup src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530: > 2528: switch (type) { > 2529: case USM_SHORT: > 2530: __ movw(Address(dest, (2 * i)), wide_value); MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit stores, any specific reason for keeping same unroll factor for all the stores. src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539: > 2537: break; > 2538: } > 2539: } I understand we want to be as accurate as possible in filling the tail in an event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if destination segment was quadword aligned, aligned quadword stores are implicitly atomic on x86 targets, what's your thoughts on using a vector instruction based loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573297441 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573299069