On Thu, 12 Sep 2024 11:32:22 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java >> line 204: >> >>> 202: // This gives about 20% performance increase for large values >>> of `length`. >>> 203: // On non-Aarch64 architectures, the unroll code will be >>> eliminated at compile time. >>> 204: if (Architecture.isAARCH64() && NATIVE_THRESHOLD_MISMATCH > >>> 64) { >> >> I'm a bit opposed to this - as it goes in the direction to add a lot of >> transient complexity when in reality the underlying issue is that aarch64 >> mismatch intrinsics should be fixed. Tinkering with thresholds is >> borderline, but still acceptable - having different implementations one per >> platform starts to look "more wrong". > > In other words, I don't think the goal of this (and related) PR is "improve > mismatch so that it blows other alternatives - like Unsafe, or array" out of > the water - as much as it is "make sure that using MemorySegment::mismatch is > competitive with other offerings". Then, an interesting part of these PRs is that we have uncovered quite a lot of issues both with our intrinsics (e.g. `fill` is not vectorized and works badly on Windows, mismatch works poorly on aarch64) *and* missed optimization opportunities - e.g. "short" segment loops are not optimized as tightly as they could. But it is not the job of these PRs to fix all these issues. The main focus remain: for small sizes it is not worth going down intrinsics-lane. Let's stick to it (in the interest of keeping the review focused). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20848#discussion_r1756682363