> Certain signatures for foreign function calls (e.g. HVA return by value) > require allocation of an intermediate buffer to adapt the FFM's to the native > stub's calling convention. In the current implementation, this buffer is > malloced and freed on every FFM invocation, a non-negligible overhead. > > Sample stack trace: > > java.lang.Thread.State: RUNNABLE > at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native > Method) > ... > at > jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386) > at > jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown > Source) > ... > at > java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder) > > > To alleviate this, this PR implements a per carrier-thread stacked allocator. > > Performance (MBA M3): > > > Before: > Benchmark Mode Cnt Score Error Units > CallOverheadByValue.byPtr avgt 10 3.333 ? 0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode Cnt Score Error Units > CallOverheadByValue.byPtr avgt 30 3.311 ? 0.034 ns/op > CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op > > > `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 > byte/call before.
Matthias Ernst has updated the pull request incrementally with four additional commits since the last revision: - test deep linker stack - Merge remote-tracking branch 'origin/mernst/cache-segments' into mernst/cache-segments - topOfStack - (c) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23142/files - new: https://git.openjdk.org/jdk/pull/23142/files/f09a29de..0e6d5320 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23142&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23142&range=14-15 Stats: 80 lines in 3 files changed: 70 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23142.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23142/head:pull/23142 PR: https://git.openjdk.org/jdk/pull/23142