On Wed, 15 Jan 2025 21:39:05 GMT, Matthias Ernst <d...@openjdk.org> wrote:
> Certain signatures for foreign function calls require allocation of an > intermediate buffer to adapt the FFM's to the native stub's calling > convention ("needsReturnBuffer"). In the current implementation, this buffer > is malloced and freed on every FFM invocation, a non-negligible overhead. > > Sample stack trace: > > java.lang.Thread.State: RUNNABLE > at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native > Method) > at > jdk.internal.misc.Unsafe.allocateMemory(java.base@25-ea/Unsafe.java:636) > at > jdk.internal.foreign.SegmentFactories.allocateMemoryWrapper(java.base@25-ea/SegmentFactories.java:215) > at > jdk.internal.foreign.SegmentFactories.allocateSegment(java.base@25-ea/SegmentFactories.java:193) > at > jdk.internal.foreign.ArenaImpl.allocateNoInit(java.base@25-ea/ArenaImpl.java:55) > at > jdk.internal.foreign.ArenaImpl.allocate(java.base@25-ea/ArenaImpl.java:60) > at > jdk.internal.foreign.ArenaImpl.allocate(java.base@25-ea/ArenaImpl.java:34) > at > java.lang.foreign.SegmentAllocator.allocate(java.base@25-ea/SegmentAllocator.java:645) > at > jdk.internal.foreign.abi.SharedUtils$2.<init>(java.base@25-ea/SharedUtils.java:388) > at > jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386) > at > jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown > Source) > at > java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@25-ea/DirectMethodHandle$Holder) > at > java.lang.invoke.LambdaForm$MH/0x000001f00109a400.invoke(java.base@25-ea/LambdaForm$MH) > at > java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder) > > > When does this happen? A fairly easy way to trigger this is through returning > a small aggregate like the following: > > struct Vector2D { > double x, y; > }; > Vector2D Origin() { > return {0, 0}; > } > > > On AArch64, such a struct is returned in two 128 bit registers v0/v1. > The VM's calling convention for the native stub consequently expects an 32 > byte output segment argument. > The FFM downcall method handle instead expects to create a 16 byte result > segment through the application-provided SegmentAllocator, and needs to > perform an appropriate adaptation, roughly like so: > > MemorySegment downcallMH(SegmentAllocator a) { > MemorySegment tmp = SharedUtils.allocate(32); > try { > nativeStub.invoke(tmp); // leaves v0, v1 in tmp > MemorySegment result = a.allocate(16); > result.setDouble(0, tmp.getDouble(0)); > result.setDouble(8, tmp.getDouble(16)); > return result; > ... Could you add the benchmark you're using to the PR as well? The benchmark should be put under `./test/micro/org/openjdk/bench/java/lang/foreign/`. This will allow others to reproduce the results, and longer term, having a benchmark on file would allow us to detect regressions/improvements in performance in the future as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23142#issuecomment-2598550989