On Thu, 30 Jan 2025 12:12:29 GMT, Albert Mingkun Yang <ay...@openjdk.org> wrote:
> Here is an attempt to simplify GCLocker implementation for Serial and > Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., > calling JNI critical APIs). JDK-7129164 introduces an optimization that > updates a shared variable (used to track the number of threads in the > critical region) only if there is a pending GC request. However, this also > means that after reaching a GC safepoint, we may discover that GCLocker is > active, preventing a GC cycle from being invoked. The inability to perform GC > at a safepoint adds complexity -- for example, a caller must retry allocation > if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads > exit the critical region before reaching a GC safepoint. This guarantees that > once inside the safepoint, we can successfully invoke a GC cycle. The > approach takes inspiration from `ZJNICritical`, but some regressions were > observed in j2dbench (on Windows) and the micro-benchmark in > [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, > instead of relying on atomic operations on a global variable when entering or > leaving the critical region, this PR uses an existing thread-local variable > with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, > SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 * Idk if GCLocker JFR events need to be available in metadata.xml if the VM does not actually ever send it. I think it does not. Maybe it is used to decode from old recordings, may be worth asking e.g. @egahlin . * the bot shows a failure that this PR's CR number shows up in the problemlist, that line needs to be deleted as well. Further it would be interesting to see how many retries there are in the allocation loop with these jnilock* stress test. * another issue, probably todo is that while Parallel GC has the emergency bailout via GC Overhead limit after excessive retries, Serial does not. Which means that it might retry for a long time, which isn't good (while it did earlier if the number of retries due to gclocker exceed that threshold) src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 323: > 321: } > 322: > 323: if (result == nullptr) { pre-existing: is it actually possible that `result` is not `nullptr` here? The code above always returns with a non-null result. Maybe assert this instead. src/hotspot/share/gc/shared/gcLocker.cpp line 86: > 84: void GCLocker::block() { > 85: assert(_lock->is_locked(), "precondition"); > 86: assert(Atomic::load(&_is_gc_request_pending) == false, "precondition"); Suggestion: assert(!Atomic::load(&_is_gc_request_pending), "precondition"); src/hotspot/share/gc/shared/gcLocker.cpp line 106: > 104: > 105: #ifdef ASSERT > 106: // Matching the storestore in GCLocker::exit Suggestion: // Matching the storestore in GCLocker::exit. src/hotspot/share/gc/shared/gcLocker.cpp line 114: > 112: void GCLocker::unblock() { > 113: assert(_lock->is_locked(), "precondition"); > 114: assert(Atomic::load(&_is_gc_request_pending) == true, "precondition"); Suggestion: assert(Atomic::load(&_is_gc_request_pending), "precondition"); src/hotspot/share/gc/shared/gcLocker.hpp line 31: > 29: #include "memory/allStatic.hpp" > 30: #include "runtime/mutex.hpp" > 31: Documentation how GCLocker works/is supposed to work is missing here. It's not exactly trivial. src/hotspot/share/gc/shared/gcLocker.hpp line 33: > 31: > 32: class GCLocker: public AllStatic { > 33: static Monitor* _lock; Not sure if having this copy/reference to `Heap_lock` makes the code more clear than referencing `Heap_lock` directly. It needs to be `Heap_lock` anyway. src/hotspot/share/gc/shared/gcLocker.hpp line 37: > 35: > 36: #ifdef ASSERT > 37: static uint64_t _debug_count; Maybe the variable could be named something less generic, indicating what it is counting. Or add a comment. src/hotspot/share/gc/shared/gcLocker.inline.hpp line 40: > 38: if (Atomic::load(&_is_gc_request_pending)) { > 39: thread->exit_critical(); > 40: // slow-path Suggestion: Not sure what this `slow-path` comment helps with. Maybe it is describing the next method (but it is named very similarly), or this is an attempt to describe the true-block of the if. In the latter case, it would maybe be better to put this comment at the start of the true-block of the if, and say something more descriptive like `// Another thread is requesting gc, enter slow path.` Not sure, feel free to ignore, it's just that to me the comment should either be removed or put upwards a line. src/hotspot/share/gc/shared/gcLocker.inline.hpp line 56: > 54: if (thread->in_last_critical()) { > 55: Atomic::add(&_debug_count, (uint64_t)-1); > 56: // Matching the loadload in GCLocker::block Suggestion: // Matching the loadload in GCLocker::block. src/hotspot/share/gc/shared/gcTraceSend.cpp line 364: > 362: #if INCLUDE_JFR > 363: > 364: #endif Please remove this empty `#if/#endif` block. src/hotspot/share/gc/shared/gc_globals.hpp line 162: > 160: "blocked by the GC locker") > \ > 161: range(0, max_uintx) > \ > 162: > \ This removal should warrant a release note; while it's a diagnostic option and we can remove at a whim, it is in use to workaround issues. src/hotspot/share/prims/whitebox.cpp line 48: > 46: #include "gc/shared/concurrentGCBreakpoints.hpp" > 47: #include "gc/shared/gcConfig.hpp" > 48: #include "gc/shared/gcLocker.hpp" Suggestion: The file does not seem to use the `GCLocker` class anymore, please remove this line as well. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2592106484 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940732531 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940775211 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940813063 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940779840 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940770235 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940769765 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940796501 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940793704 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940812598 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940746077 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940748992 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940752118