On Thu, 23 Jan 2025 08:33:23 GMT, Matthias Ernst <d...@openjdk.org> wrote:

>> Certain signatures for foreign function calls (e.g. HVA return by value) 
>> require allocation of an intermediate buffer to adapt the FFM's to the 
>> native stub's calling convention. In the current implementation, this buffer 
>> is malloced and freed on every FFM invocation, a non-negligible overhead.
>> 
>> Sample stack trace:
>> 
>>    java.lang.Thread.State: RUNNABLE
>>      at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native 
>> Method)
>> ...
>>      at 
>> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>>      at 
>> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>>  Source)
>> ...
>>      at 
>> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
>> 
>> 
>> To alleviate this, this PR implements a per carrier-thread stacked allocator.
>> 
>> Performance (MBA M3):
>> 
>> 
>> Before:
>> Benchmark                    Mode  Cnt   Score   Error  Units
>> CallOverheadByValue.byPtr    avgt   10   3.333 ? 0.152  ns/op
>> CallOverheadByValue.byValue  avgt   10  33.892 ? 0.034  ns/op
>> 
>> After:
>> Benchmark                    Mode  Cnt  Score   Error  Units
>> CallOverheadByValue.byPtr    avgt   30  3.311 ? 0.034  ns/op
>> CallOverheadByValue.byValue  avgt   30  6.143 ? 0.053  ns/op
>> 
>> 
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 
>> byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with four 
> additional commits since the last revision:
> 
>  - test deep linker stack
>  - Merge remote-tracking branch 'origin/mernst/cache-segments' into 
> mernst/cache-segments
>  - topOfStack
>  - (c)

src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line 65:

> 63:         @ForceInline
> 64:         public Arena pushFrame(long size, long byteAlignment) {
> 65:             boolean needsLock = Thread.currentThread().isVirtual() && 
> !lock.isHeldByCurrentThread();

@minborg please check this -- you have discovered some cases where `isVirtual` 
is not enough (e.g. because virtual threads use carrier in the common pool, 
which can also be used for non-virtual thread stuff)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1926906087

Reply via email to