Re: RFR: 8287788: Implement a better allocator for downcalls [v16]

Maurizio Cimadamore Thu, 23 Jan 2025 04:40:46 -0800

On Thu, 23 Jan 2025 08:33:23 GMT, Matthias Ernst <[email protected]> wrote:


>> Certain signatures for foreign function calls (e.g. HVA return by value) 
>> require allocation of an intermediate buffer to adapt the FFM's to the 
>> native stub's calling convention. In the current implementation, this buffer 
>> is malloced and freed on every FFM invocation, a non-negligible overhead.
>> 
>> Sample stack trace:
>> 
>>    java.lang.Thread.State: RUNNABLE
>>      at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native 
>> Method)
>> ...
>>      at 
>> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>>      at 
>> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>>  Source)
>> ...
>>      at 
>> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
>> 
>> 
>> To alleviate this, this PR implements a per carrier-thread stacked allocator.
>> 
>> Performance (MBA M3):
>> 
>> 
>> Before:
>> Benchmark                    Mode  Cnt   Score   Error  Units
>> CallOverheadByValue.byPtr    avgt   10   3.333 ? 0.152  ns/op
>> CallOverheadByValue.byValue  avgt   10  33.892 ? 0.034  ns/op
>> 
>> After:
>> Benchmark                    Mode  Cnt  Score   Error  Units
>> CallOverheadByValue.byPtr    avgt   30  3.311 ? 0.034  ns/op
>> CallOverheadByValue.byValue  avgt   30  6.143 ? 0.053  ns/op
>> 
>> 
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 
>> byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with four 
> additional commits since the last revision:
> 
>  - test deep linker stack
>  - Merge remote-tracking branch 'origin/mernst/cache-segments' into 
> mernst/cache-segments
>  - topOfStack
>  - (c)

src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line 103:

> 101:             @SuppressWarnings("restricted")
> 102:             public MemorySegment allocate(long byteSize, long 
> byteAlignment) {
> 103:                 return frame.allocate(byteSize, byteAlignment);

Should this also check order?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1926910879

Re: RFR: 8287788: Implement a better allocator for downcalls [v16]

Reply via email to