On Mon, 20 Jan 2025 18:13:53 GMT, Matthias Ernst <d...@openjdk.org> wrote:

>> Certain signatures for foreign function calls (e.g. HVA return by value) 
>> require allocation of an intermediate buffer to adapt the FFM's to the 
>> native stub's calling convention. In the current implementation, this buffer 
>> is malloced and freed on every FFM invocation, a non-negligible overhead.
>> 
>> Sample stack trace:
>> 
>>    java.lang.Thread.State: RUNNABLE
>>      at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native 
>> Method)
>> ...
>>      at 
>> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>>      at 
>> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>>  Source)
>> ...
>>      at 
>> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
>> 
>> 
>> To alleviate this, this PR remembers and reuses up to two small intermediate 
>> buffers per carrier-thread in subsequent calls.
>> 
>> Performance (MBA M3):
>> 
>> 
>> Before:
>> Benchmark                    Mode  Cnt   Score   Error  Units
>> CallOverheadByValue.byPtr    avgt   10   3.333 ? 0.152  ns/op
>> CallOverheadByValue.byValue  avgt   10  33.892 ? 0.034  ns/op
>> 
>> After:
>> Benchmark                         Mode  Cnt    Score    Error  Units
>> CallOverheadByValue.byPtr    avgt   10  3.291 ? 0.031  ns/op
>> CallOverheadByValue.byValue  avgt   10  5.464 ? 0.007  ns/op
>> 
>> 
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 
>> byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   whitespace :scream:

test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line 54:

> 52: @State(org.openjdk.jmh.annotations.Scope.Thread)
> 53: @OutputTimeUnit(TimeUnit.NANOSECONDS)
> 54: @Fork(value = 1, jvmArgs = {"--enable-native-access=ALL-UNNAMED", 
> "-Djava.library.path=micro/native"})

Please set the fork value to at least 3, so we can spot bimodal results.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1922753705

Reply via email to