On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
> As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This > `Method*` is read from the `LambdaForm::vmentry` field associated with the > target method handle at the time when the upcall stub is generated. The MH > instance itself is stashed in a global JNI ref. So, there should be a > reachability chain to the holder class: `MH (receiver) -> LF (form) -> > MemberName (vmentry) -> ResolvedMethodName (method) -> Class<?> (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the > `vmentry` field of the `LambdaForm` of the target method handle of an upcall > stub, it is possible that the `vmentry` is updated _after_ we embed the > corresponding `Method`* into an upcall stub (or rather, the latest update is > not visible to the thread generating the upcall stub). Technically, it is > fine to keep using a 'stale' `vmentry`, but the problem is that now the > reachability chain is broken, since the upcall stub only extracts the target > `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class > can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in > `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from > the target method handle each time. Luckily, this does not really seem to > impact performance. > > <details> > <summary>Performance numbers</summary> > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ± 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ± 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ± 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ± 0.554 ns/op > > </details> > > As for the added TestUpcallStress test, it takes about 800 seconds to run > this test on the dev machine I'm using, so I've set the timeout quite high. > Since it runs for so long, I've dropped it from the default `jdk_foreign` > test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/share/runtime/frame.cpp line 1132: > 1130: } > 1131: _cb->as_upcall_stub()->oops_do(f, *this); > 1132: } This was previously handled by overriding `preserve_callee_argument_oops` in `UpcallStub` as `ShouldNotReachHere`. That function was removed though. We should really have a check like this, since it helps rule out missed handling of the receiver handle which can cause GC issues, so I've added this here. I can move it to a separate PR as well, if preferred. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1742153421