On Thu, 23 Jan 2025 08:33:23 GMT, Matthias Ernst <[email protected]> wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implementation, this buffer
>> is malloced and freed on every FFM invocation, a non-negligible overhead.
>>
>> Sample stack trace:
>>
>> java.lang.Thread.State: RUNNABLE
>> at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native
>> Method)
>> ...
>> at
>> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>> at
>> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>> Source)
>> ...
>> at
>> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
>>
>>
>> To alleviate this, this PR implements a per carrier-thread stacked allocator.
>>
>> Performance (MBA M3):
>>
>>
>> Before:
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadByValue.byPtr avgt 10 3.333 ? 0.152 ns/op
>> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>>
>> After:
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadByValue.byPtr avgt 30 3.311 ? 0.034 ns/op
>> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op
>>
>>
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160
>> byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with four
> additional commits since the last revision:
>
> - test deep linker stack
> - Merge remote-tracking branch 'origin/mernst/cache-segments' into
> mernst/cache-segments
> - topOfStack
> - (c)
src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line 103:
> 101: @SuppressWarnings("restricted")
> 102: public MemorySegment allocate(long byteSize, long
> byteAlignment) {
> 103: return frame.allocate(byteSize, byteAlignment);
Should this also check order?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1926910879