On Tue, 7 May 2024 15:42:23 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
> This PR fixes an issue that has crept into the FFM API implementation. > > From very early stages, the FFM API used to disable the alignment check on > nested layout elements, in favor of an alignment check against the memory > segment base address. The rationale was that the JIT had issue with > eliminating redundant alignment checks, and accessing nested elements could > never result in alignment issues, since the alignment of the container is > provably bigger than that of the contained element. This means that, when > creating a var handle for a nested layout element, we set the nested layout > alignment to 1 (unaligned), derive its var handle, and then decorate the var > handle with an alignment check for the container. > > At some point in 22, we tweaked the API to throw > `UnsupportedOperationException` when using an access mode incompatible with > the alignment constraint of the accessed layout. That is, a volatile read on > an `int` is only possible if the access occurs at an address that is at least > 4-byte aligned. Otherwise an `UOE` is thrown. > > Unfortunately this change broke the aforementioned optimization: creating a > var handle for an unaligned layout works, but the resulting layout will *not* > support any of the atomic access modes. > > Since this optimization is not really required anymore (proper C2 support to > hoist/eliminate alignment checks has been added since then), it is better to > disable this implementation quirk, and leave optimizations to C2. > > (If we really really wanted to optimize things a bit, we could remove the > container alignment check in the case the accessed value is the first layout > element nested in the container, but this PR doesn't go that far). > > I've run relevant benchmarks before/after and found no differences. In part > this is because `arrayElementVarHandle` is unaffected. But, even after > tweaking the `LoopOverNonConstant` benchmark to add explicit tests for the > code path affected, no significant difference was found, sign that C2 is > indeed able to spot (and remove) the redundant alignment check. Note: if we > know that `aligned_to_N(base)` holds, then it's easy to prove that > `aligned_to_M(base + offset + index * scale)` holds, when `N >= M` and with > `offset` and `scale` known (the latter a power of two). LGTM. As mentioned in the PR notes, additional optimizations could be made and this could be the objective of a future PR. ------------- Marked as reviewed by pminborg (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19124#pullrequestreview-2043680093