On Wed, 15 Jan 2025 21:39:05 GMT, Matthias Ernst <d...@openjdk.org> wrote:

> Certain signatures for foreign function calls require allocation of an 
> intermediate buffer to adapt the FFM's to the native stub's calling 
> convention ("needsReturnBuffer"). In the current implementation, this buffer 
> is malloced and freed on every FFM invocation, a non-negligible overhead.
> 
> Sample stack trace:
> 
>    java.lang.Thread.State: RUNNABLE
>       at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native 
> Method)
>       at 
> jdk.internal.misc.Unsafe.allocateMemory(java.base@25-ea/Unsafe.java:636)
>       at 
> jdk.internal.foreign.SegmentFactories.allocateMemoryWrapper(java.base@25-ea/SegmentFactories.java:215)
>       at 
> jdk.internal.foreign.SegmentFactories.allocateSegment(java.base@25-ea/SegmentFactories.java:193)
>       at 
> jdk.internal.foreign.ArenaImpl.allocateNoInit(java.base@25-ea/ArenaImpl.java:55)
>       at 
> jdk.internal.foreign.ArenaImpl.allocate(java.base@25-ea/ArenaImpl.java:60)
>       at 
> jdk.internal.foreign.ArenaImpl.allocate(java.base@25-ea/ArenaImpl.java:34)
>       at 
> java.lang.foreign.SegmentAllocator.allocate(java.base@25-ea/SegmentAllocator.java:645)
>       at 
> jdk.internal.foreign.abi.SharedUtils$2.<init>(java.base@25-ea/SharedUtils.java:388)
>       at 
> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>       at 
> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>  Source)
>       at 
> java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@25-ea/DirectMethodHandle$Holder)
>       at 
> java.lang.invoke.LambdaForm$MH/0x000001f00109a400.invoke(java.base@25-ea/LambdaForm$MH)
>       at 
> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
> 
> 
> When does this happen? A fairly easy way to trigger this is through returning 
> a small aggregate like the following:
> 
> struct Vector2D {
>   double x, y;
> };
> Vector2D Origin() {
>   return {0, 0};
> }
> 
> 
> On AArch64, such a struct is returned in two 128 bit registers v0/v1.
> The VM's calling convention for the native stub consequently expects an 32 
> byte output segment argument.
> The FFM downcall method handle instead expects to create a 16 byte result 
> segment through the application-provided SegmentAllocator, and needs to 
> perform an appropriate adaptation, roughly like so:
> 
>   MemorySegment downcallMH(SegmentAllocator a) {
>     MemorySegment tmp = SharedUtils.allocate(32);
>     try {
>       nativeStub.invoke(tmp);  // leaves v0, v1 in tmp
>       MemorySegment result = a.allocate(16);
>       result.setDouble(0, tmp.getDouble(0));
>       result.setDouble(8, tmp.getDouble(16));
>       return result;
>    ...

Could you add the benchmark you're using to the PR as well? The benchmark 
should be put under `./test/micro/org/openjdk/bench/java/lang/foreign/`. This 
will allow others to reproduce the results, and longer term, having a benchmark 
on file would allow us to detect regressions/improvements in performance in the 
future as well.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23142#issuecomment-2598550989

Reply via email to