On Fri, 30 Aug 2024 09:09:57 GMT, Per Minborg <pminb...@openjdk.org> wrote:

>> The performance of the `MemorySegment::fil` can be improved by replacing the 
>> `checkAccess()` method call with calling `checkReadOnly()` instead (as the 
>> bounds of the segment itself do not need to be checked).
>> 
>> Also, smaller segments can be handled directly by Java code rather than 
>> transitioning to native code.
>> 
>> Here is how the `MemorySegment::fill` performance is improved by this PR:
>> 
>> ![image](https://github.com/user-attachments/assets/ee29fdf0-a7cf-4d5b-bb6b-278b01d97e3c)
>> 
>> Operations involving 8 or more bytes are delegated to native code whereas 
>> smaller segments are handled via a switch rake.
>> 
>> It should be noted that `Arena::allocate` is using `MemorySegment::fil`. 
>> Hence, this PR will also have a positive effect on memory allocation 
>> performance.
>
> Per Minborg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove unused imports

Changes requested by franz1...@github.com (no known OpenJDK username).

src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template
 line 210:

> 208:                 // Handle smaller segments directly without 
> transitioning to native code
> 209:                 final long u = Byte.toUnsignedLong(value);
> 210:                 final long longValue = u << 56 | u << 48 | u << 40 | u 
> << 32 | u << 24 | u << 16 | u << 8 | u;

this can be `u * 0xFFFFFFFFFFFFL` if `value != 0` and just `0L` if not: not 
sure if fast(er), need to measure.

Most of the time filling is happy with 0 since zeroing is the most common case

-------------

PR Review: https://git.openjdk.org/jdk/pull/20712#pullrequestreview-2271723714
PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1738292680

Reply via email to