RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Patricio Chilano Mateo Wed, 06 Nov 2024 09:40:56 -0800

This is the implementation of JEP 491: Synchronize Virtual Threads without 
Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further 
details.


In order to make the code review easier the changes have been split into the 
following initial 4 commits:

- Changes to allow unmounting a virtual thread that is currently holding 
monitors.
- Changes to allow unmounting a virtual thread blocked on synchronized trying 
to acquire the monitor.
- Changes to allow unmounting a virtual thread blocked in `Object.wait()` and 
its timed-wait variants.
- Changes to tests, JFR pinned event, and other changes in the JDK libraries.

The changes fix pinning issues for all 4 ports that currently implement 
continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added 
recently and stand in its own commit after the initial ones.

The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default 
locking mode, (and `LM_MONITOR` which comes for free), but not when using 
`LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated 
([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the 
intention to remove `LM_LEGACY` code in future releases.


## Summary of changes

### Unmount virtual thread while holding monitors

As stated in the JEP, currently when a virtual thread enters a synchronized 
method or block, the JVM records the virtual thread's carrier platform thread 
as holding the monitor, not the virtual thread itself. This prevents the 
virtual thread from being unmounted from its carrier, as ownership information 
would otherwise go wrong. In order to fix this limitation we will do two things:

- We copy the oops stored in the LockStack of the carrier to the stackChunk 
when freezing (and clear the LockStack). We copy the oops back to the LockStack 
of the next carrier when thawing for the first time (and clear them from the 
stackChunk). Note that we currently assume carriers don't hold monitors while 
mounting virtual threads.

- For inflated monitors we now record the `java.lang.Thread.tid` of the owner 
in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us 
to tie the owner of the monitor to a `java.lang.Thread` instance, rather than 
to a JavaThread which is only created per platform thread. The tid is already a 
64 bit field so we can ignore issues of the counter wrapping around.

#### General notes about this part:

- Since virtual threads don't need to worry about holding monitors anymore, we 
don't need to count them, except for `LM_LEGACY`. So the majority of the 
platform dependent changes in this commit have to do with correcting this.
- Zero and x86 (32 bits) where counting monitors even though they don't 
implement continuations, so I fixed that to stop counting. The idea is to 
remove all the counting code once we remove `LM_LEGACY`.
- Macro `LOOM_MONITOR_SUPPORT` was added at the time to exclude ports that 
implement continuations but don't yet implement monitor support. It is removed 
later with the ppc commit changes.
- Since now a virtual thread can be unmounted while holding monitors, JVMTI 
methods `GetOwnedMonitorInfo` and `GetOwnedMonitorStackDepthInfo` had to be 
adapted.

#### Notes specific to the tid changes:

- The tid is cached in the JavaThread object under `_lock_id`. It is set on 
JavaThread creation and changed on mount/unmount.
- Changes in the ObjectMonitor class in this commit are pretty much exclusively 
related to changing `_owner` and `_succ` from `void*` and `JavaThread*` 
respectively to `int64_t`.
- Although we are not trying to fix `LM_LEGACY` the tid changes apply to it as 
well since the inflated path is shared. Thus, in case of inflation by a 
contending thread, the `BasicLock*` cannot be stored in the `_owner` field as 
before. The `_owner` is instead set to anonymous as we do in `LM_LIGHTWEIGHT`, 
and the `BasicLock*` is stored in the new field `_stack_locker`.
- We already assume 32 bit platforms can handle 64 bit atomics, including 
`cmpxchg` ([JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776)) so the 
shared code can stay the same. The assembly code for the c2 fast paths has to 
be adapted though. On arm (32bits) we already jump directly to the slow path on 
inflated monitor case so there is nothing to do. For x86 (32bits), since the 
port is moving towards deprecation 
([JDK-8338285](https://bugs.openjdk.org/browse/JDK-8338285)) there is no point 
in trying to optimize, so the code was changed to do the same thing we do for 
arm (32bits).

### Unmounting a virtual thread blocked on synchronized

Currently virtual thread unmounting is always started from Java, either because 
of a voluntarily call to `Thread.yield()` or because of performing some 
blocking operation such as I/O. Now we allow to unmount from inside the VM too, 
specifically when facing contention trying to acquire a Java monitor.

On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread 
will call freeze to copy all Java frames to the heap. We will add the virtual 
thread to the ObjectMonitor's queue and return back to Java. Instead of 
continue execution in Java though, the virtual thread will jump to a preempt 
stub which will clear the frames copied from the physical stack, and will 
return to `Continuation.run()` to proceed with the unmount logic. Once the 
owner releases the monitor and selects it as the next successor the virtual 
thread will be added again to the scheduler queue to run again. The virtual 
thread will run and attempt to acquire the monitor again. If it succeeds then 
it will thaw frames as usual to continue execution back were it left off. If it 
fails it will unmount and wait again to be unblocked.

#### General notes about this part:

- The easiest way to review these changes is to start from the monitorenter 
call in the interpreter and follow all the flow of the virtual thread, from 
unmounting to running again.
- Currently we use a dedicated unblocker thread to submit the virtual threads 
back to the scheduler queue. This avoids calls to Java from monitorexit. We are 
experimenting on removing this limitation, but that will be left as an 
enhancement for a future change.
- We cannot unmount the virtual thread when the monitor enter call is coming 
from `jni_enter()` or `ObjectLocker` since we would need to freeze native 
frames.
- If freezing fails, which almost always will be due to having native frames on 
the stack, the virtual thread will follow the normal platform thread logic but 
will do a timed-park instead. This is to alleviate some deadlocks cases where 
the successor picked is an unmounted virtual thread that cannot run, which can 
happen during class loading or class initiatialization.
- After freezing all frames, and while adding itself to the `_cxq` the virtual 
thread could have successfully acquired the monitor. In that case we mark the 
preemption as cancelled. The virtual thread will still need to go back to the 
preempt stub to cleanup the physical stack but instead of unmounting it will 
call thaw to continue execution.
- The way we jump to the preempt stub is slightly different in the compiler and 
interpreter. For the compiled case we just patch a return address, so no new 
code is added. For the interpreter we cannot do this on all platforms so we 
just check a flag back in the interpreter. For the latter we also need to 
manually restore some state after we finally acquire the monitor and resume 
execution. All that logic is contained in new assembler method 
`call_VM_preemptable()`.

#### Notes specific to JVMTI changes:
- Since we are not unmounting from Java, there is no call to 
`VirtualThread.yieldContinuation()`. This means that we have to execute the 
equivalent of `notifyJvmtiUnmount(/*hide*/true)` for unmount, and of 
`notifyJvmtiMount(/*hide*/false)` for mount in the VM. The former is 
implemented with `JvmtiUnmountBeginMark` in `Continuation::try_preempt()`. The 
latter is implemented in method `jvmti_mount_end()` in `ContinuationFreezeThaw` 
at the end of thaw.
- When unmounting from Java the vthread unmount event is posted before we try 
to freeze the continuation. If that fails then we post the mount event. This 
all happens in `VirtualThread.yieldContinuation()`. When unmounting from the VM 
we only post the event once we know the freeze succeeded. Since at that point 
we are in the middle of the VTMS transition, posting the event is done in 
`JvmtiVTMSTransitionDisabler::VTMS_unmount_end()` after the transition 
finishes. Maybe the same thing should be done when unmounting from Java.

### Unmounting a virtual thread blocked on `Object.wait()`

This commit just extends the previous mechanism to be able to unmount inside 
the VM on `ObjectMonitor::wait`.

####  General notes about this part:
- The mechanism works as before with the difference that now the call will come 
from the native wrapper. This requires to add support to the continuation code 
to handle native wrapper frames, which is a main part of the changes in this 
commit.
- Both the compiled and interpreted native wrapper code will check for 
preemption on return from the wait call, after we have transitioned back to 
`_thread_in_Java`.

####  Note specific to JVMTI changes:
- If the monitor waited event is enabled we need to post it after the wait is 
done but before re-acquiring the monitor. Since the virtual thread is inside 
the VTMS transition at that point, we cannot do that directly. Currently in the 
code we end the transition, post the event and start the transition again. This 
is not ideal, and maybe we should unmount, post the event and then run again to 
try reacquire the monitor.


### Test changes + JFR Updates + Library code changes

#### Tests 

- The tests in `java/lang/Thread/virtual` are updated to add more tests for 
monitor enter/exit and Object.wait/notify. New tests are added for JFR events, 
synchronized native methods, and stress testing for several scenarios.
- `test/hotspot/gtest/nmt/test_vmatree.cpp` is changed due to an alias that 
conflicts. 
- A small number of tests, e.g.` 
test/hotspot/jtreg/serviceability/sa/ClhsdbInspect.java` and 
`test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002`, are 
updated so they are in sync with the JDK code. 
- A number of JVMTI tests are updated to fix various issues, e.g. some tests 
saved a JNIEnv in a static. 

#### Diagnosing remaining pinning issues

- The diagnostic option `jdk.tracePinnedThreads` is removed. 
- The JFR `jdk.VirtualThreadPinned` event is changed so that it's now recorded 
in the VM, and for the following cases: parking when pinned, blocking in 
monitor enter when pinned, Object.wait when pinned, and waiting for a class to 
be initialized by another thread. The changes to object monitors should mean 
that only a few events are recorded. Future work may change this to a sampling 
approach.

#### Other changes to VirtualThread class

The VirtualThread implementation includes a few robustness changes. The 
`park/parkNanos` methods now park on the carrier if the freeze throws OOME. 
Moreover, the use of transitions is reduced so that the call out to the 
scheduler no longer requires a temporary transition. 

#### Other changes to libraries: 

- `ReferenceQueue` is reverted to use `synchronized`, the subclass based on 
`ReentrantLock` is removed. This change is done now because the changes for 
object monitors impact this area when there is preemption polling a reference 
queue. 
- `java.io` is reverted to use `synchronized`. This change has been important 
for testing virtual threads. There will be follow-up cleanup in main-line after 
the JEP is integrated to remove `InternalLock` and its uses in `java.io`. 
- The epoll and kqueue based Selectors are changed to preempt when doing 
blocking selects. This has been useful for testing virtual threads with some 
libraries, e.g. JDBC drivers. We could potentially separate this update if 
needed but it has been included in all testing and EA builds. 
- `sun.security.ssl.X509TrustManagerImpl` is changed to eagerly initialize 
AnchorCertificates, a forced change due to deadlocks in this code when testing. 

## Testing 

The changes have been running in the Loom pipeline for several months now. They 
have also been included in EA builds throughout the year at different stages 
(EA builds from earlier this year did not had Object.wait() support yet but 
more recent ones did) so there has been some external exposure too.

The current patch has been run through mach5 tiers 1-8. I'll keep running tests 
periodically until integration time.

-------------

Commit messages:
 - Use is_top_frame boolean in FreezeBase::check_valid_fast_path()
 - Move load of _lock_id in C2_MacroAssembler::fast_lock
 - Add --enable-native-access=ALL-UNNAMED to SynchronizedNative.java
 - Update comment for _cont_fastpath
 - Add ReflectionCallerCacheTest.java to test/jdk/ProblemList-Xcomp.txt
 - Use ThreadIdentifier::initial() in ObjectMonitor::owner_from()
 - Fixes to JFR metadata.xml
 - Fix return miss prediction in generate_native_entry for riscv
 - Fix s390x failures
 - Add oopDesc::has_klass_gap() check
 - ... and 70 more: https://git.openjdk.org/jdk/compare/751a914b...211c6c81

Changes: https://git.openjdk.org/jdk/pull/21565/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8338383
  Stats: 9914 lines in 246 files changed: 7105 ins; 1629 del; 1180 mod
  Patch: https://git.openjdk.org/jdk/pull/21565.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565

PR: https://git.openjdk.org/jdk/pull/21565

RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Reply via email to