On Tue, 3 Sep 2024 15:22:36 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
>> This PR proposes to handle smaller FFM copy operations with Java code rather >> than transitioning to native code. This will improve performance. In this >> PR, copy operations involving zero to 63 bytes will be handled by Java code. >> >> Here is what it looks like for Windows x64: >> >>  >> >> Here is another chart for Linux a64: >> >>  >> >> Other platforms exhibit similar behavior. It should be noted that the gain >> with this PR is pronounced for certain common sizes that are more likely to >> appear in code (e.g. 8, 16, 24, and 32) >> >> It would be possible to use the same code path for the 7arg >> `MemorySegment::copy` method if it is similar to: >> >> >> MemorySegment.copy(heapSrcSegment, JAVA_BYTE, 0, heapDstSegment, JAVA_BYTE, >> 0, ELEM_SIZE); >> >> >> This could be added in a separate PR. >> >> This PR has been tested with tier1-3 and passed. > > src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java > line 637: > >> 635: for (; offset < limit; offset += 8) { >> 636: final long v = >> SCOPED_MEMORY_ACCESS.getLong(src.sessionImpl(), src.unsafeGetBase(), >> src.unsafeGetOffset() + srcOffset + offset); >> 637: SCOPED_MEMORY_ACCESS.putLong(dst.sessionImpl(), >> dst.unsafeGetBase(), dst.unsafeGetOffset() + dstOffset + offset, v); > > I've been digging a bit deeper on why we can get away w/o aligned access > (both here and for `fill`). It turns out that, conveniently, var handles use > `Unsafe::getXYZUnaligned` for plain set/get. This means that the intrinsics > will deal with unaligned access accordingly, depending on the platform > (either by splitting into multiple accesses, or by issuing an unaligned > access on platforms that support that). > > This means we can write "simple" code, and expect C2 to rewrite it the most > optimal way. We should probably add a comment in this direction (both for > `fill` and `copy`. Separately, I wonder if it would make sense to put all these "tricky" routines off to one side - e.g. `BulkSegmentSupport`, so that we don't make `AbstractMemorySegmentImpl` even bigger than it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20829#discussion_r1742262076