When inflating a monitor the `ObjectMonitor*` is written directly over the `markWord` and any overwritten data is displaced into a displaced `markWord`. This is problematic for concurrent GCs which needs extra care or looser semantics to use this displaced data. In Lilliput this data also contains the klass forcing this to be something that the GC has to take into account everywhere.
This patch introduces an alternative solution where locking only uses the lock bits of the `markWord` and inflation does not override and displace the `markWord`. This is done by keeping associations between objects and `ObjectMonitor*` in an external hash table. Different caching techniques are used to speedup lookups from compiled code. A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only supported in combination with the LM_LIGHTWEIGHT locking mode (the default). This patch has been evaluated to be performance neutral when `UseObjectMonitorTable` is turned off (the default). Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` and `UseObjectMonitorTable` works. # Cleanups Cleaned up displaced header usage for: * BasicLock * Contains some Zero changes * Renames one exported JVMCI field * ObjectMonitor * Updates comments and tests consistencies # Refactoring `ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark` witness object has been introduced to the signatures. Which signals that the contentions reference counter is being held. More details are given below in the section about deflation. The initial purpose of this was to allow `UseObjectMonitorTable` to interact more seamlessly with the `ObjectMonitor::enter` code. _There is even more `ObjectMonitor` refactoring which can be done here to create a more understandable and enforceable API. There are a handful of invariants / assumptions which are not always explicitly asserted which could be trivially abstracted and verified by the type system by using similar witness objects._ # LightweightSynchronizer Working on adapting and incorporating the following section as a comment in the source code ## Fast Locking CAS on locking bits in markWord. 0b00 (Fast Locked) <--> 0b01 (Unlocked) When locking and 0b00 (Fast Locked) is observed, it may be beneficial to avoid inflating by spinning a bit. If 0b10 (Inflated) is observed or there is to much contention or to long critical sections for spinning to be feasible, inflated locking is performed. ### Fast Lock Spinning (UseObjectMonitorTable) When a thread fails fast locking when a monitor is not yet inflated, it will spin on the markWord using a exponential backoff scheme. The thread will attempt the fast lock CAS and then SpinWait() for some time, doubling with every failed attempt, up to a maximum number of attempts. There is a diagnostic VM option LightweightFastLockingSpins which can be used to tune this value. The behavior of SpinWait() can be hardware dependent. A future improvement may be to adapt this spinning limit to observed behavior. Which would automatically adapt to the different hardware behavior of SpinWait(). ## Inflated Locking Inflated locking means that a ObjectMonitor is associated with the object and is used for locking instead of the locking bits in the markWord. ## Inflated Locking without table (!UseObjectMonitorTable) An inflating thread will create a ObjectMonitor and CAS the ObjectMonitor* into the markWord along with the 0b10 (Inflated) lock bits. If the transition of the lock bits is from 0b00 (Fast Locked) the ObjectMonitor must be published with an anonymous owner (setting _owner to ANONYMOUS_OWNER). If the transition of the lock bits is from 0b00 (Unlocked) the ObjectMonitor is published with no owner. When encountering an ObjectMonitor with an anonymous owner the thread checks its lock stack to see if it is the owner, in which case it removes the object from its lock stack and sets itself as the owner of the ObjectMonitor along with fixing the recursion level to correspond to the number of removed lock stack entires. ## Inflated Locking with table (UseObjectMonitorTable) Because publishing the ObjectMonitor* and signaling that a object's monitor is inflated is not atomic, more care must be taken (in the presence of deflation) so that all threads agree on which ObjectMonitor* to use. When encountering an ObjectMonitor with an anonymous owner the thread checks its lock stack to see if it is the owner, in which case it removes the object from its lock stack and sets itself as the owner of the ObjectMonitor along with fixing the recursion level to correspond to the number of removed lock stack entires. All complications arise from deflation, or the process of disassociating an ObjectMonitor from its Java Object. So first the mechanism used for deflation is explained. Followed by retrieval and creation of ObjectMonitors. ### Deflation An ObjectMonitor can only be deflated if it has no owner, its queues are empty and no thread is in a scope where it has incremented and checked the contentions reference counter. The interactions between deflation and wait is handled by having the owner and wait queue entry overlap to blocks out deflation; the wait queue entry is protected by a waiters reference counter which is only modified by the waiters while holding the monitor, incremented before exiting the monitor and decremented after reentering the monitor. For enter and exit where the deflator may observe empty queues and no owner a two step mechanism is used to synchronize deflation with concurrently locking threads; deflation is synchronized using the contentions reference counter. In the text below we refer to "holding the contentions reference counter". This means that a thread has incremented the contentions reference counter and verified that it is not negative. ```c++ if (Atomic::fetch_and_add(&monitor->_contentions, 1) >= 0) { // holding the contentions reference counter } Atomic::decrement(&monitor->_contentions); ``` #### Deflation protocol The first step for the deflator is to try and CAS the owner from no owner to a special marker (DEFLATER_MARKER). If this is successful it blocks any entering thread from successfully installing themselves as the owner and causes compiled code to take a slow path and call into the runtime. The second step for the deflator is to check waiters reference counter and if it is 0 try CAS the contentions reference counter from 0 to a large negative value (INT_MIN). If this succeeds the monitor is deflated. The deflator does not have to check the entry queues because every thread on the entry queues must have either hold the contentions reference counter, or incremented the waiters reference counter, in the case they were moved from the wait queue to the entry queues by a notify. The deflator check the waiters reference counter, with the memory ordering of Waiter: { increment waiters reference counter; release owner }, Deflator: { acquire owner; check waiters reference counter }. All threads on the entry queues or wait queue invariantly holds the contentions reference counter or the waiters reference counter. #### Deflation cleanup If deflation succeeds, locking bits are then transitioned back to 0b01 (Unlocked). With UseObjectMonitorTable it is required that this is done by the deflator, or it could lead to ABA problems in the locking bits. Without the table the whole ObjectMonitor* is part of the markWord transition, with its pointer being phased out of the system with a handshake, making every value distinguishable and avoiding ABA issues. For UseObjectMonitorTable the deflated monitor is also removed from the table. This is done after transitioning the markWord to allow concurrently entering threads to fast lock on the object while the monitor is being removed from the hash table. If deflation fails after the marker (DEFLATER_MARKER) has been CASed into the owner field the owner must be restored. From the deflation threads point of view it is as simple as CASing from the marker to no owner. However to not have all threads depend on the deflation thread making progress here we allow any thread to CAS from the marker if that thread has both incremented and checked the contentions counter. This thread has now effectively canceled the deflation, but it is important that the deflator observes this fact, we do this by forgetting to decrement the contentions counter. The effect is that the contentions CAS will fail, which will force the deflator to try and restore the owner, but this will also fail because it got canceled. So the deflator decrements the contentions counter instead on behalf of the canceling thread to balance the reference counting. (Currently this is implemented by doing a +1 +1 -1 reference count on the locking thread, but a simple only +1 would s uffice). ### Retrieve ObjectMonitor #### HashTable Maintains a mapping between Java Objects and ObjectMonitors. Lookups are done via the objects identity_hash. If the hash table contains an ObjectMonitor for a specific object then that ObjectMonitor is used for locking unless it is being deflated. Only deflation removes (not dead) entries inside the HashTable. #### ThreadLocal Cache (UseObjectMonitorTable) The most recently locked ObjectMonitors by a thread are cached in that thread's local storage. These are used to elide hash table lookups. These caches uses raw oops to make cache lookups trivial. However this requires special handling of the cache at safepoints. The caches are cleared when a safepoint is triggered (instead of letting the gc visit them), this to avoid keeping cache entries as gc roots. These cache entires may become deflated, but locking on such a monitor still participates in the normal deflation protocol. Because these entries are cleared during a safepoint, the handshake performed by monitor deflation to phase out ObjectMonitor* from the system will also phase these out. #### StackLocal Cache Each monitorenter has a corresponding BasicLock entry on the stack. Each successful inflated monitorenter saves the ObjectMonitor* inside this BasicLock entry and retrieves it when performing the corresponding monitorexit. This means it is important that the BasicLock entry is always initialized to a known state (nullptr is used). The RAII object class CacheSetter is used to ensure that the BasicLock gets initialized before leaving the runtime code, and that both caches gets updated correctly. (Only once, with the same locked ObjectMonitor). The cache entries are set when a monitor is entered and never used again after a that monitored has been exited. So there are no interactions with deflation here. Similarly these caches does not track the associated oop, but rely on the fact that the same BasicLock data created for a monitorenter is used when executing the corresponding monitorexit. ### Creating ObjectMonitor If retrieval of the ObjectMonitor fails, because there is no ObjectMonitor, either because this is the first time inflating or the ObjectMonitor has been deflated a new ObjectMonitor must be created and associated with the object. The inflating thread will then attempt to insert a newly created ObjectMonitor in the hash table. The important invariant is that any ObjectMonitor inserted must have an anonymous owner (setting _owner to ANONYMOUS_OWNER). This solves the issue of not being able to atomically inserting the ObjectMonitor in the hash table, and transitioning the markWord to 0b10 (Inflated). We instead have all inflating threads insert an identical anonymously owned ObjectMonitor in the table and then decide ownership based on how the markWord is transitioned to 0b10 (Inflated). Note: Only one ObjectMonitor can be inserted. This also has the effect of blocking deflation on a newly inserted ObjectMonitor, until the contentions reference counter can be incremented. The contentions reference counter is held while transitioning the markWord to block out deflation. * If a thread observes 0b10 (Inflated) * If the current thread is the thread that fast locked, take ownership. Update ObjectMonitor _recursions based on fast locked recursions. Call ObjectMonitor::enter(current); * Otherwise Some other thread is the owner, and will claim ownership. Call ObjectMonitor::enter(current); * If a thread succeeds with the CAS to 0b10 (Inflated) * From 0b00 (Fast Locked) * If the current thread is the thread that fast locked, take ownership. Update ObjectMonitor _recursions based on fast locked recursions. Call ObjectMonitor::enter(current); * Otherwise Some other thread is the owner, and will claim ownership. Call ObjectMonitor::enter(current); * From 0b01 (Unlocked) * Claim ownership, no ObjectMonitor::enter is required. * If a thread fails the CAS reload markWord and retry ### Un-contended Inflated Locking CAS on _owner field in ObjectMonitor. JavaThread* (Locked By Thread) <--> nullptr (Unlocked) ### Contended Inflated Locking Blocks out deflation. Spin CAS on _owner field in ObjectMonitor. JavaThread* (Locked By Thread) <--> nullptr (Unlocked) Details in ObjectMonitor.hpp ### HashTable Resizing and Cleanup Resizing is currently handled with the similar logic to what the string and symbol table uses. And is delegated to the ServiceThread. The goal is to eventually this to deflation thread, to allow for better interactions with the deflation cycles, making it possible to also shrink the table. But this will be done incrementally as a separate enhancement. The ServiceThread is currently used to deal with the fact that we currently allow the deflation thread to be turned off via JVM options. Cleanup is mostly handled by the the deflator which actively removes deflated monitors, which includes monitors for dead objects. However we allow any thread to remove dead objects' ObjectMonitor* associations. But actual memory reclamation of the ObjectMonitor is always handled by the deflator. The table is currently initialized before `init_globals`, as such the max size of the table which is based on `MaxHeapSize` may be incorrect because it is not yet finalized. ------------- Commit messages: - 8315884: New Object to ObjectMonitor mapping Changes: https://git.openjdk.org/jdk/pull/20067/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315884 Stats: 3613 lines in 70 files changed: 2700 ins; 313 del; 600 mod Patch: https://git.openjdk.org/jdk/pull/20067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067 PR: https://git.openjdk.org/jdk/pull/20067