On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst <d...@openjdk.org> wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implementation, this buffer >> is malloced and freed on every FFM invocation, a non-negligible overhead. >> >> Sample stack trace: >> >> java.lang.Thread.State: RUNNABLE >> at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native >> Method) >> ... >> at >> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386) >> at >> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown >> Source) >> ... >> at >> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder) >> >> >> To alleviate this, this PR implements a per carrier-thread stacked allocator. >> >> Performance (MBA M3): >> >> >> Before: >> Benchmark Mode Cnt Score Error Units >> CallOverheadByValue.byPtr avgt 10 3.333 ? 0.152 ns/op >> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> CallOverheadByValue.byPtr avgt 30 3.311 ? 0.034 ns/op >> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op >> >> >> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160 >> byte/call before. > > Matthias Ernst has updated the pull request incrementally with one additional > commit since the last revision: > > fix test under VThread factory Sorry to say, but the implementation seems to have a bug that is causing occasional heap corruption, which is being caught by mac's malloc guards. Since this is failing in tier 1, and the issue seems like it will take some time to investigate and fix, I'm backing out the change for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23142#issuecomment-2627569855