On Tue, 3 Sep 2024 15:22:36 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:

>> This PR proposes to handle smaller FFM copy operations with Java code rather 
>> than transitioning to native code. This will improve performance. In this 
>> PR, copy operations involving zero to 63 bytes will be handled by Java code.
>> 
>> Here is what it looks like for Windows x64:
>> 
>> ![image](https://github.com/user-attachments/assets/6b31206e-3b24-4b34-bf38-a1be393186d3)
>> 
>> Here is another chart for Linux a64:
>> 
>> ![image](https://github.com/user-attachments/assets/b679bfac-670a-42a5-802b-2b17adf5ec79)
>> 
>> Other platforms exhibit similar behavior. It should be noted that the gain 
>> with this PR is pronounced for certain common sizes that are more likely to 
>> appear in code (e.g. 8, 16, 24, and 32)
>> 
>> It would be possible to use the same code path for the 7arg  
>> `MemorySegment::copy` method if it is similar to:
>> 
>> 
>> MemorySegment.copy(heapSrcSegment, JAVA_BYTE, 0, heapDstSegment, JAVA_BYTE, 
>> 0, ELEM_SIZE);
>> 
>> 
>> This could be added in a separate PR.
>> 
>> This PR has been tested with tier1-3 and passed.
>
> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java
>  line 637:
> 
>> 635:             for (; offset < limit; offset += 8) {
>> 636:                 final long v = 
>> SCOPED_MEMORY_ACCESS.getLong(src.sessionImpl(), src.unsafeGetBase(), 
>> src.unsafeGetOffset() + srcOffset + offset);
>> 637:                 SCOPED_MEMORY_ACCESS.putLong(dst.sessionImpl(), 
>> dst.unsafeGetBase(), dst.unsafeGetOffset() + dstOffset + offset, v);
> 
> I've been digging a bit deeper on why we can get away w/o aligned access 
> (both here and for `fill`). It turns out that, conveniently, var handles use 
> `Unsafe::getXYZUnaligned` for plain set/get. This means that the intrinsics 
> will deal with unaligned access accordingly, depending on the platform 
> (either by splitting into multiple accesses, or by issuing an unaligned 
> access on platforms that support that).
> 
> This means we can write "simple" code, and expect C2 to rewrite it the most 
> optimal way. We should probably add a comment in this direction (both for 
> `fill` and `copy`.

Separately, I wonder if it would make sense to put all these "tricky" routines 
off to one side - e.g. `BulkSegmentSupport`, so that we don't make 
`AbstractMemorySegmentImpl` even bigger than it is.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20829#discussion_r1742262076

Reply via email to