On Wed, 4 Sep 2024 06:14:57 GMT, Fei Yang <fy...@openjdk.org> wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This >> `Method*` is read from the `LambdaForm::vmentry` field associated with the >> target method handle at the time when the upcall stub is generated. The MH >> instance itself is stashed in a global JNI ref. So, there should be a >> reachability chain to the holder class: `MH (receiver) -> LF (form) -> >> MemberName (vmentry) -> ResolvedMethodName (method) -> Class<?> (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the >> `vmentry` field of the `LambdaForm` of the target method handle of an upcall >> stub, it is possible that the `vmentry` is updated _after_ we embed the >> corresponding `Method`* into an upcall stub (or rather, the latest update is >> not visible to the thread generating the upcall stub). Technically, it is >> fine to keep using a 'stale' `vmentry`, but the problem is that now the >> reachability chain is broken, since the upcall stub only extracts the target >> `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class >> can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in >> `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from >> the target method handle each time. Luckily, this does not really seem to >> impact performance. >> >> <details> >> <summary>Performance numbers</summary> >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ± 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ± 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ± 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ± 0.554 ns/op >> >> </details> >> >> As for the added TestUpcallStress test, it takes about 800 seconds to run >> this test on the dev machine I'm using, so I've set the timeout quite high. >> Since it runs for so long, I've dropped it from the default `jdk_foreign` >> test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > src/hotspot/cpu/riscv/upcallLinker_riscv.cpp line 264: > >> 262: >> 263: __ block_comment("{ load target "); >> 264: __ movptr(j_rarg0, (intptr_t) receiver); > > Hi @JornVernee , Could you please apply following small add-on change for > linux-riscv64? As I witnessed build warning with GCC-13. Otherwise, builds > fine and the newly-added test/jdk/java/foreign/TestUpcallStress.java is > passing. (PS: jdk_foreign tests are passing too) > > > diff --git a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > index 5c45a679316..55160be99d0 100644 > --- a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > +++ b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp > @@ -261,7 +261,7 @@ address UpcallLinker::make_upcall_stub(jobject receiver, > Symbol* signature, > __ block_comment("} argument shuffle"); > > __ block_comment("{ load target "); > - __ movptr(j_rarg0, (intptr_t) receiver); > + __ movptr(j_rarg0, (address) receiver); > __ far_call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // > loads Method* into xmethod > __ block_comment("} load target "); > > diff --git a/test/jdk/java/foreign/TestUpcallStress.java > b/test/jdk/java/foreign/TestUpcallStress.java > index 3b9b1d4b207..40607746856 100644 > --- a/test/jdk/java/foreign/TestUpcallStress.java > +++ b/test/jdk/java/foreign/TestUpcallStress.java > @@ -24,7 +24,7 @@ > /* > * @test > * @requires jdk.foreign.linker != "FALLBACK" > - * @requires os.arch == "aarch64" & os.name == "Linux" > + * @requires (os.arch == "aarch64" | os.arch=="riscv64") & os.name == "Linux" > * @requires os.maxMemory > 4G > * @modules java.base/jdk.internal.foreign > * @build NativeTestHelper CallGeneratorHelper TestUpcallBase
Were you able to reproduce the original issue on RISC-V? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743643186