Going forward, converting older JDK code to use the relatively new FFM API 
requires system calls that can provide `errno` and the likes to explicitly 
allocate a `MemorySegment` to capture potential error states. This can lead to 
negative performance implications if not designed carefully and also introduces 
unnecessary code complexity.

Hence, this PR proposes to add a JDK internal method handle adapter that can be 
used to handle system calls with `errno`, `GetLastError`, and `WSAGetLastError`.

It relies on an efficient carrier-thread-local cache of memory regions to 
allide allocations.

Here are some benchmarks that ran on a platform thread and virtual threads 
respectively:


Benchmark                                                  Mode  Cnt   Score   
Error  Units
CaptureStateUtilBench.OfVirtual.adaptedSysCallFail         avgt   30  24.193 ? 
0.268  ns/op
CaptureStateUtilBench.OfVirtual.adaptedSysCallSuccess      avgt   30   8.268 ? 
0.080  ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationFail     avgt   30  42.076 ? 
1.003  ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationSuccess  avgt   30  21.801 ? 
0.138  ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationFail           avgt   30  23.265 ? 
0.087  ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationSuccess        avgt   30   8.285 ? 
0.155  ns/op

CaptureStateUtilBench.adaptedSysCallFail                   avgt   30  23.033 ? 
0.423  ns/op
CaptureStateUtilBench.adaptedSysCallSuccess                avgt   30   3.676 ? 
0.104  ns/op  // <- Happy path using an internal pool

CaptureStateUtilBench.explicitAllocationFail               avgt   30  42.023 ? 
0.736  ns/op
CaptureStateUtilBench.explicitAllocationSuccess            avgt   30  22.013 ? 
0.648  ns/op  // <- Allocating memory upon each invocation

CaptureStateUtilBench.tlAllocationFail                     avgt   30  22.050 ? 
0.233  ns/op
CaptureStateUtilBench.tlAllocationSuccess                  avgt   30   3.756 ? 
0.056  ns/op  // <- Using the pool explicitly from Java code


Adapted system call:


        return (int) ADAPTED_HANDLE.invoke(0, 0); // Uses a MH-internal pool


Explicit allocation:


        try (var arena = Arena.ofConfined()) {
            return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
        }


Thread Local allocation:


        try (var arena = POOLS.take()) {
            return (int) HANDLE.invoke(arena.allocate(4), 0, 0); // Uses a 
manually specified pool
        }


The adapted system call exhibits a ~6x performance improvement over the 
existing "explicit allocation" scheme for the happy path on platform threads. 
Because there needs to be sharing across threads for virtual-tread-capable 
carrier threads, these are a bit slower ("only" ~2.5x faster).

Tested and passed tiers 1-3.

-------------

Commit messages:
 - Bump copyright year
 - Add benchmarks
 - Add method handle adapter for system calls

Changes: https://git.openjdk.org/jdk/pull/23517/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23517&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8347408
  Stats: 1381 lines in 11 files changed: 1370 ins; 2 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/23517.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23517/head:pull/23517

PR: https://git.openjdk.org/jdk/pull/23517

Reply via email to