Re: RFR: 8339531: Improve performance of MemorySegment::mismatch [v12]

Maurizio Cimadamore Thu, 12 Sep 2024 04:37:54 -0700

On Thu, 12 Sep 2024 11:32:22 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> 
wrote:


>> src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java 
>> line 204:
>> 
>>> 202:         // This gives about 20% performance increase for large values 
>>> of `length`.
>>> 203:         // On non-Aarch64 architectures, the unroll code will be 
>>> eliminated at compile time.
>>> 204:         if (Architecture.isAARCH64() && NATIVE_THRESHOLD_MISMATCH > 
>>> 64) {
>> 
>> I'm a bit opposed to this - as it goes in the direction to add a lot of 
>> transient complexity when in reality the underlying issue is that aarch64 
>> mismatch intrinsics should be fixed. Tinkering with thresholds is 
>> borderline, but still acceptable - having different implementations one per 
>> platform starts to look "more wrong".
>
> In other words, I don't think the goal of this (and related) PR is "improve 
> mismatch so that it blows other alternatives - like Unsafe, or array" out of 
> the water - as much as it is "make sure that using MemorySegment::mismatch is 
> competitive with other offerings".

Then, an interesting part of these PRs is that we have uncovered quite a lot of 
issues both with our intrinsics (e.g. `fill` is not vectorized and works badly 
on Windows, mismatch works poorly on aarch64) *and* missed optimization 
opportunities - e.g. "short" segment loops are not optimized as tightly as they 
could. But it is not the job of these PRs to fix all these issues. The main 
focus remain: for small sizes it is not worth going down intrinsics-lane. Let's 
stick to it (in the interest of keeping the review focused).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20848#discussion_r1756682363

Re: RFR: 8339531: Improve performance of MemorySegment::mismatch [v12]

Reply via email to