On Wed, 15 Jan 2025 16:09:36 GMT, Per Minborg <pminb...@openjdk.org> wrote:
>> Going forward, converting older JDK code to use the relatively new FFM API >> requires system calls that can provide `errno` and the likes to explicitly >> allocate a MemorySegment to capture potential error states. This can lead to >> negative performance implications if not designed carefully and also >> introduces unnecessary code complexity. >> >> Hence, this PR proposes to add a _JDK internal_ method handle adapter that >> can be used to handle system calls with `errno`, `GetLastError`, and >> `WSAGetLastError`. >> >> It currently relies on a thread-local cache of MemorySegments to allide >> allocations. If, in the future, a more efficient thread-associated >> allocation scheme becomes available, we could easily migrate to that one. >> >> Here are some benchmarks: >> >> >> Benchmark Mode Cnt Score Error >> Units >> CaptureStateUtilBench.explicitAllocationFail avgt 30 41.615 ? 1.203 >> ns/op >> CaptureStateUtilBench.explicitAllocationSuccess avgt 30 23.094 ? 0.580 >> ns/op >> CaptureStateUtilBench.threadLocalFail avgt 30 14.760 ? 0.078 >> ns/op >> CaptureStateUtilBench.threadLocalReuseSuccess avgt 30 7.189 ? 0.151 >> ns/op >> >> >> Explicit allocation: >> >> try (var arena = Arena.ofConfined()) { >> return (int) HANDLE.invoke(arena.allocate(4), 0, 0); >> } >> >> >> Thread Local (tl): >> >> return (int) ADAPTED_HANDLE.invoke(arena.allocate(4), 0, 0); >> >> >> The graph below shows the difference in latency for a successful call: >> >>  >> >> This is a ~3x improvement for both the happy and the error path. >> >> >> Tested and passed tiers 1-3. > > Per Minborg has updated the pull request incrementally with two additional > commits since the last revision: > > - Use invokeExact semantics in the tests > - Clean up src/java.base/share/classes/jdk/internal/foreign/CaptureStateUtil.java line 51: > 49: private static final long SIZE = > Linker.Option.captureStateLayout().byteSize(); > 50: > 51: private static final TerminatingThreadLocal<SegmentCache> TL_CACHE = > new TerminatingThreadLocal<>() { I wonder if this should be encapsulated inside the SegmentCache class. In principle the client should only do `acquire` which will get the thread-local cache, and try to get a segment from there, if possible (but the entire operation should be transparent from the client perspective). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1917077192