On Sat, 11 Mar 2023 15:57:53 GMT, Roman Kennke <rken...@openjdk.org> wrote:
> > Proposal for omitting the lockstack size check (at least in 75% of all > > times): > > > > * We know that Thread as well as grown lockstack backing buffers start at > > malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). > > So at the very least 4. > > * Make the initial lockstack this size. Define it so that initial slot > > stack starts at offset 0. > > * Load the current slot pointer as you do now. Check the lowest 2 bits. If > > all are zero, go the slower path (load the current limit and compare > > against limit, ...). > > * If bit 0 or 1 are set, you can omit this check. You are done since you > > have not yet reached the limit. > > * You can expand this proposal to any alignment you like. You need to > > declare the lockstack slots with `alignof(X)`, and the compiler will take > > care that the _initial_ slot stack is always well aligned. As for larger > > slot stacks, we will have to allocate them in an aligned fashion using > > posix_memalign (we need this as NMT-wrapped version, but thats trivial) > > This would only work when pushing a single slot, right? Have you seen what > we're doing in the compiled (C1 and C2) paths (in x86_64 and aarch64)? There > we're doing a (conservative) estimate how many lock-slots are needed in the > method, and check for enough slots upon method entry once, and then elide the > check altogether in the lock-enter implementation. Yeah, I just realized this myself. I started working on the template interpreter first, where we push single stack slots. There it may still make sense. ------------- PR: https://git.openjdk.org/jdk/pull/10907