As discussed in the JBS issue:

FFM upcall stubs embed a `Method*` of the target method in the stub. This 
`Method*` is read from the `LambdaForm::vmentry` field associated with the 
target method handle at the time when the upcall stub is generated. The MH 
instance itself is stashed in a global JNI ref. So, there should be a 
reachability chain to the holder class: `MH (receiver) -> LF (form) -> 
MemberName (vmentry) -> ResolvedMethodName (method) -> Class<?> (vmholder)`

However, it appears that, due to multiple threads racing to initialize the 
`vmentry` field of the `LambdaForm` of the target method handle of an upcall 
stub, it is possible that the `vmentry` is updated _after_ we embed the 
corresponding `Method`* into an upcall stub (or rather, the latest update is 
not visible to the thread generating the upcall stub). Technically, it is fine 
to keep using a 'stale' `vmentry`, but the problem is that now the reachability 
chain is broken, since the upcall stub only extracts the target `Method*`, and 
doesn't keep the stale `vmentry` reachable. The holder class can then be 
unloaded, resulting in a crash.

The fix I've chosen for this is to mimic what we already do in 
`MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the 
target method handle each time. Luckily, this does not really seem to impact 
performance.

<details>
<summary>Performance numbers</summary>
x64:

before:

Benchmark             Mode  Cnt   Score   Error  Units
Upcalls.panama_blank  avgt   30  69.216 ± 1.791  ns/op


after:

Benchmark             Mode  Cnt   Score   Error  Units
Upcalls.panama_blank  avgt   30  67.787 ± 0.684  ns/op


aarch64:

before:

Benchmark             Mode  Cnt   Score   Error  Units
Upcalls.panama_blank  avgt   30  61.574 ± 0.801  ns/op


after:

Benchmark             Mode  Cnt   Score   Error  Units
Upcalls.panama_blank  avgt   30  61.218 ± 0.554  ns/op

</details>

As for the added TestUpcallStress test, it takes about 800 seconds to run this 
test on the dev machine I'm using, so I've set the timeout quite high. Since it 
runs for so long, I've dropped it from the default `jdk_foreign` test suite, 
which runs in tier2. Instead the new test will run in tier4.

Testing: tier 1-4

-------------

Commit messages:
 - Add s390 changes
 - Merge branch 'master' into LoadVMTraget
 - Don't save/restore LR/CR + resolve_jobject on s390
 - eyeball other platforms
 - call stub from upcall stub
 - reinit_heap_base
 - eyeball other platforms
 - Only test on Linux/AArch64
 - aarch64 impl
 - load vmentry in on_entry using special stub
 - ... and 8 more: https://git.openjdk.org/jdk/compare/0e6bb514...8dcb14ff

Changes: https://git.openjdk.org/jdk/pull/20479/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20479&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8337753
  Stats: 334 lines in 23 files changed: 257 ins; 26 del; 51 mod
  Patch: https://git.openjdk.org/jdk/pull/20479.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20479/head:pull/20479

PR: https://git.openjdk.org/jdk/pull/20479

Reply via email to