On Wed, 22 Jan 2025 10:12:13 GMT, Matthias Ernst <d...@openjdk.org> wrote:

>> Certain signatures for foreign function calls (e.g. HVA return by value) 
>> require allocation of an intermediate buffer to adapt the FFM's to the 
>> native stub's calling convention. In the current implementation, this buffer 
>> is malloced and freed on every FFM invocation, a non-negligible overhead.
>> 
>> Sample stack trace:
>> 
>>    java.lang.Thread.State: RUNNABLE
>>      at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native 
>> Method)
>> ...
>>      at 
>> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>>      at 
>> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>>  Source)
>> ...
>>      at 
>> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
>> 
>> 
>> To alleviate this, this PR remembers and reuses up to two small intermediate 
>> buffers per carrier-thread in subsequent calls.
>> 
>> Performance (MBA M3):
>> 
>> 
>> Before:
>> Benchmark                    Mode  Cnt   Score   Error  Units
>> CallOverheadByValue.byPtr    avgt   10   3.333 ? 0.152  ns/op
>> CallOverheadByValue.byValue  avgt   10  33.892 ? 0.034  ns/op
>> 
>> After:
>> Benchmark                         Mode  Cnt    Score    Error  Units
>> CallOverheadByValue.byPtr    avgt   10  3.291 ? 0.031  ns/op
>> CallOverheadByValue.byValue  avgt   10  5.464 ? 0.007  ns/op
>> 
>> 
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 
>> byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   --unnecessary annotations

src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line 38:

> 36:         @SuppressWarnings("restricted")
> 37:         public MemorySegment allocate(long byteSize, long byteAlignment) {
> 38:             MemorySegment slice = backingSegment.asSlice(offset, 
> byteSize, byteAlignment);

You have re-implemented a slicing allocator 
(`SegmentAllocator::slicingAllocator`). I think that's probably ok, but I'll 
point out that slicing allocator will try to deal with alignment, and will also 
throw exceptions when called with non-sensical arguments.

A more robust way to do this would be to:
1. have `reserve` pass the reserved size to `Frame`
2. `Frame` will slice the segment according to offset/size
3. then create a slicing allocator based on that segment
4. use the slicing allocator to implement `allocate`

In our tests, something like this did not add any extra overhead (the 
allocation of the slicing allocator is escape-analyzed away)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1925099911

Reply via email to