On Mon, 20 Jan 2025 16:20:20 GMT, Matthias Ernst <d...@openjdk.org> wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implementation, this buffer >> is malloced and freed on every FFM invocation, a non-negligible overhead. >> >> Sample stack trace: >> >> java.lang.Thread.State: RUNNABLE >> at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native >> Method) >> ... >> at >> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386) >> at >> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown >> Source) >> ... >> at >> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder) >> >> >> To alleviate this, this PR remembers and reuses up to two small intermediate >> buffers per carrier-thread in subsequent calls. >> >> Performance (MBA M3): >> >> >> master@764d70b7df18e288582e616c62b0d7078f1ff3aa >> Benchmark Mode Cnt Score Error Units >> PointsAlloc.circle_by_ptr avgt 30 9.197 ? 0.037 ns/op >> PointsAlloc.circle_by_value avgt 30 42.195 ? 0.088 ns/op <= >> ####### >> PointsAlloc.jni_ByteBuffer_alloc avgt 30 226.127 ? 35.378 ns/op >> PointsAlloc.jni_long_alloc avgt 30 25.297 ? 2.457 ns/op >> PointsAlloc.panama_alloc avgt 30 27.053 ? 1.915 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> PointsAlloc.circle_by_ptr avgt 30 9.156 ? 0.021 ns/op >> PointsAlloc.circle_by_value avgt 30 11.995 ? 0.051 ns/op <= >> ####### >> PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op >> PointsAlloc.jni_long_alloc avgt 30 24.885 ? 2.461 ns/op >> PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op >> >> >> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 >> byte/call before. > > Matthias Ernst has updated the pull request incrementally with three > additional commits since the last revision: > > - shift api boundary > - move bench > - revert formatting src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 393: > 391: MemorySegment unscoped = CallBufferCache.acquireOrAllocate(size); > 392: Arena scope = Arena.ofConfined(); > 393: MemorySegment source = unscoped.reinterpret(scope, null); I suggest passing the `scope` to `acquireOrAllocate` as well, so we only need a single call to `reinterpret` (it has an overload that takes both a size and arena. (And I think you can then also remove the `@SuppressWarnings` from this method) src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line 395: > 393: MemorySegment source = unscoped.reinterpret(scope, null); > 394: // Preferable we'd like to register this cleanup in the line > above > 395: // but it breaks scalar replacement. I have a fix for this issue. Will create a PR. test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 54: > 52: @State(org.openjdk.jmh.annotations.Scope.Thread) > 53: @OutputTimeUnit(TimeUnit.NANOSECONDS) > 54: @Fork(value = 1, jvmArgs = {"-Xlog:gc", > "--enable-native-access=ALL-UNNAMED", "-Djava.library.path=micro/native"}) FWIW, unfortunately there is no builtin support to run profilers through `make`. I personally have a separate script that runs the benchmarks.jar: ./build/$CONF/images/jdk/bin/java -jar ./build/$CONF/images/test/micro/benchmarks.jar -prof gc AllocTest.alloc_confined (where `$CONF` is your build configuration) test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 98: > 96: (SegmentAllocator) (_, _) -> dest, > 97: phi); > 98: } Would it be viable to measure just a single invocation? That way we can remove other things like the `asSlice` call, which might introduce noise. test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 99: > 97: phi); > 98: } > 99: return points; I don't think `points` needs to be returned here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922701981 PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922701268 PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922699935 PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922700908 PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922698930