On Thu, 30 Jan 2025 12:12:29 GMT, Albert Mingkun Yang <ay...@openjdk.org> wrote:

> Here is an attempt to simplify GCLocker implementation for Serial and 
> Parallel.
> 
> GCLocker prevents GC when Java threads are in a critical region (i.e., 
> calling JNI critical APIs). JDK-7129164 introduces an optimization that 
> updates a shared variable (used to track the number of threads in the 
> critical region) only if there is a pending GC request. However, this also 
> means that after reaching a GC safepoint, we may discover that GCLocker is 
> active, preventing a GC cycle from being invoked. The inability to perform GC 
> at a safepoint adds complexity -- for example, a caller must retry allocation 
> if the request fails due to GC being inhibited by GCLocker.
> 
> The proposed patch uses a readers-writer lock to ensure that all Java threads 
> exit the critical region before reaching a GC safepoint. This guarantees that 
> once inside the safepoint, we can successfully invoke a GC cycle. The 
> approach takes inspiration from `ZJNICritical`, but some regressions were 
> observed in j2dbench (on Windows) and the micro-benchmark in 
> [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, 
> instead of relying on atomic operations on a global variable when entering or 
> leaving the critical region, this PR uses an existing thread-local variable 
> with a store-load barrier for synchronization.
> 
> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, 
> SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress.
> 
> Test: tier1-8

* Idk if GCLocker JFR events need to be available in metadata.xml if the VM 
does not actually ever send it. I think it does not.
Maybe it is used to decode from old recordings, may be worth asking e.g. 
@egahlin .
* the bot shows a failure that this PR's CR number shows up in the problemlist, 
that line needs to be deleted as well. Further it would be interesting to see 
how many retries there are in the allocation loop with these jnilock* stress 
test.
* another issue, probably todo is that while Parallel GC has the emergency 
bailout via GC Overhead limit after excessive retries, Serial does not. Which 
means that it might retry for a long time, which isn't good (while it did 
earlier if the number of retries due to gclocker exceed that threshold)

src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 323:

> 321:     }
> 322: 
> 323:     if (result == nullptr) {

pre-existing: is it actually possible that `result` is not `nullptr` here? The 
code above always returns with a non-null result.
Maybe assert this instead.

src/hotspot/share/gc/shared/gcLocker.cpp line 86:

> 84: void GCLocker::block() {
> 85:   assert(_lock->is_locked(), "precondition");
> 86:   assert(Atomic::load(&_is_gc_request_pending) == false, "precondition");

Suggestion:

  assert(!Atomic::load(&_is_gc_request_pending), "precondition");

src/hotspot/share/gc/shared/gcLocker.cpp line 106:

> 104: 
> 105: #ifdef ASSERT
> 106:   // Matching the storestore in GCLocker::exit

Suggestion:

  // Matching the storestore in GCLocker::exit.

src/hotspot/share/gc/shared/gcLocker.cpp line 114:

> 112: void GCLocker::unblock() {
> 113:   assert(_lock->is_locked(), "precondition");
> 114:   assert(Atomic::load(&_is_gc_request_pending) == true, "precondition");

Suggestion:

  assert(Atomic::load(&_is_gc_request_pending), "precondition");

src/hotspot/share/gc/shared/gcLocker.hpp line 31:

> 29: #include "memory/allStatic.hpp"
> 30: #include "runtime/mutex.hpp"
> 31: 

Documentation how GCLocker works/is supposed to work is missing here. It's not 
exactly trivial.

src/hotspot/share/gc/shared/gcLocker.hpp line 33:

> 31: 
> 32: class GCLocker: public AllStatic {
> 33:   static Monitor* _lock;

Not sure if having this copy/reference to `Heap_lock` makes the code more clear 
than referencing `Heap_lock` directly. It needs to be `Heap_lock` anyway.

src/hotspot/share/gc/shared/gcLocker.hpp line 37:

> 35: 
> 36: #ifdef ASSERT
> 37:   static uint64_t _debug_count;

Maybe the variable could be named something less generic, indicating what it is 
counting. Or add a comment.

src/hotspot/share/gc/shared/gcLocker.inline.hpp line 40:

> 38:     if (Atomic::load(&_is_gc_request_pending)) {
> 39:       thread->exit_critical();
> 40:       // slow-path

Suggestion:



Not sure what this `slow-path` comment helps with. Maybe it is describing the 
next method (but it is named very similarly), or this is an attempt to describe 
the true-block of the if.
In the latter case, it would maybe be better to put this comment at the start 
of the true-block of the if, and say something more descriptive like `// 
Another thread is requesting gc, enter slow path.`

Not sure, feel free to ignore, it's just that to me the comment should either 
be removed or put upwards a line.

src/hotspot/share/gc/shared/gcLocker.inline.hpp line 56:

> 54:   if (thread->in_last_critical()) {
> 55:     Atomic::add(&_debug_count, (uint64_t)-1);
> 56:     // Matching the loadload in GCLocker::block

Suggestion:

    // Matching the loadload in GCLocker::block.

src/hotspot/share/gc/shared/gcTraceSend.cpp line 364:

> 362: #if INCLUDE_JFR
> 363: 
> 364: #endif

Please remove this empty `#if/#endif` block.

src/hotspot/share/gc/shared/gc_globals.hpp line 162:

> 160:           "blocked by the GC locker")                                    
>    \
> 161:           range(0, max_uintx)                                            
>    \
> 162:                                                                          
>    \

This removal should warrant a release note; while it's a diagnostic option and 
we can remove at a whim, it is in use to workaround issues.

src/hotspot/share/prims/whitebox.cpp line 48:

> 46: #include "gc/shared/concurrentGCBreakpoints.hpp"
> 47: #include "gc/shared/gcConfig.hpp"
> 48: #include "gc/shared/gcLocker.hpp"

Suggestion:


The file does not seem to use the `GCLocker` class anymore, please remove this 
line as well.

-------------

Changes requested by tschatzl (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2592106484
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940732531
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940775211
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940813063
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940779840
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940770235
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940769765
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940796501
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940793704
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940812598
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940746077
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940748992
PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940752118

Reply via email to