Re: RFR: 8342902: Deduplication of acquire calls in BindingSpecializer causes escape-analyisis failure

Jorn Vernee Fri, 25 Oct 2024 08:09:41 -0700

On Fri, 25 Oct 2024 08:37:30 GMT, Maurizio Cimadamore <[email protected]> 
wrote:


> This PR fixes an issue where passing many by-reference parameters to downcall 
> results in escape analysis failures.
> The problem is that, as the parameters grow, the generated code in the 
> trampoline stub we generate also grows.
> When it reaches a certain threshold, it becomes too big, and it is no longer 
> inlined in the caller.
> When this happens, allocations for by-reference parameters (e.g. a segment 
> constructed from `MemorySegment::ofAddress`) can no longer be eliminated.
> 
> The solution is two-fold. First, we annotate the generated trampoline with 
> `@ForceInline`. After all, it is rather critical, to guarantee optimal 
> performance, that this code can be always inlined.
> Second, to keep the size of the generated code under control, we also put a 
> limit on the max number of comparisons that are generated in order to 
> "deduplicate" scope acquire/release calls.
> The deduplication logic is a bit finicky -- it was put in place because, when 
> confined and shared are passed by-ref, we need to prevent them from being 
> closed in the middle of a native call.
> So, we save all the seen scopes in a bunch of locals, and then we compare 
> each new scope with _all_ the previous cached locals, and skip acquire if we 
> can.
> 
> While this strategy work it does not scale when there are many by-ref 
> parameters - as a by-ref parameter N will need N - 1 comparisons - which 
> means a quadratic number of comparisons is generated.
> This is fixed in this PR by putting a lid on the maximum number of 
> comparisons that are generated. We also make the comparisons a bit smarter, 
> by always skipping the very first by-ref argument -- the downcall address.
> It is in fact very common for the downcall address to be in a different scope 
> than that of the other by-ref arguments anyway.
> 
> A nice property of the new logic is that by configuring the max number of 
> comparisons we can effectively select between different strategies:
> * max = 0, means no dedup
> * max = 1, means one-element cache
> * max = N, means full dedup (like before)
> 
> Thanks to Ioannis (@spasi) for the report and the benchmark. I've beefed the 
> benchmark up by adding a case for 10 arguments, and also adding support for 
> critical downcalls, so we could also test passing by-ref heap segments. 
> Benchmark result will be provided in a separate comment.

Looks great, aside from the native lib issue (inline)

test/micro/org/openjdk/bench/java/lang/foreign/libCallByRefHighArity.c line 33:

> 31: EXPORT void noop_params5(int param0, int param1, void *param2, void 
> *param3, void *param4) {}
> 32: EXPORT void noop_params10(int param0, int param1, void *param2, void 
> *param3, void *param4,
> 33:                           int param5, int param6, void *param7, void 
> *param8, void *param9) {}

Some of these are `int` which mismatches the `ADDRESS` layout used to create 
the function descriptor.

-------------

Marked as reviewed by jvernee (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21706#pullrequestreview-2395567620
PR Review Comment: https://git.openjdk.org/jdk/pull/21706#discussion_r1816850025

Re: RFR: 8342902: Deduplication of acquire calls in BindingSpecializer causes escape-analyisis failure

Reply via email to