On Fri, 25 Oct 2024 08:37:30 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
> This PR fixes an issue where passing many by-reference parameters to downcall > results in escape analysis failures. > The problem is that, as the parameters grow, the generated code in the > trampoline stub we generate also grows. > When it reaches a certain threshold, it becomes too big, and it is no longer > inlined in the caller. > When this happens, allocations for by-reference parameters (e.g. a segment > constructed from `MemorySegment::ofAddress`) can no longer be eliminated. > > The solution is two-fold. First, we annotate the generated trampoline with > `@ForceInline`. After all, it is rather critical, to guarantee optimal > performance, that this code can be always inlined. > Second, to keep the size of the generated code under control, we also put a > limit on the max number of comparisons that are generated in order to > "deduplicate" scope acquire/release calls. > The deduplication logic is a bit finicky -- it was put in place because, when > confined and shared are passed by-ref, we need to prevent them from being > closed in the middle of a native call. > So, we save all the seen scopes in a bunch of locals, and then we compare > each new scope with _all_ the previous cached locals, and skip acquire if we > can. > > While this strategy work it does not scale when there are many by-ref > parameters - as a by-ref parameter N will need N - 1 comparisons - which > means a quadratic number of comparisons is generated. > This is fixed in this PR by putting a lid on the maximum number of > comparisons that are generated. We also make the comparisons a bit smarter, > by always skipping the very first by-ref argument -- the downcall address. > It is in fact very common for the downcall address to be in a different scope > than that of the other by-ref arguments anyway. > > A nice property of the new logic is that by configuring the max number of > comparisons we can effectively select between different strategies: > * max = 0, means no dedup > * max = 1, means one-element cache > * max = N, means full dedup (like before) > > Thanks to Ioannis (@spasi) for the report and the benchmark. I've beefed the > benchmark up by adding a case for 10 arguments, and also adding support for > critical downcalls, so we could also test passing by-ref heap segments. > Benchmark result will be provided in a separate comment. This pull request has now been integrated. Changeset: f1a9a8d2 Author: Maurizio Cimadamore <mcimadam...@openjdk.org> URL: https://git.openjdk.org/jdk/commit/f1a9a8d25b2e1f9b5dbe8719abb66ec4cd9057dc Stats: 303 lines in 3 files changed: 296 ins; 0 del; 7 mod 8342902: Deduplication of acquire calls in BindingSpecializer causes escape-analyisis failure Reviewed-by: jvernee ------------- PR: https://git.openjdk.org/jdk/pull/21706