On Fri, 20 Dec 2024 21:29:18 GMT, Robert Toyonaga <d...@openjdk.org> wrote:

>> This is a redo of [JDK-8304824](https://bugs.openjdk.org/browse/JDK-8304824) 
>> which was backed out by 
>> [JDK-8343726](https://bugs.openjdk.org/browse/JDK-8343726) due to problems 
>> documented in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244).
>> 
>> The problem was that `NmtVirtualMemoryLocker` was not locking when the 
>> current thread is not attached by checking `Thread::current_or_null_safe() 
>> != nullptr`. This is necessary during VM init, but should not be allowed 
>> afterward. NMT may be used in `attach_current_thread` before the current 
>> thread is set. The lock was not being acquired in that case, which 
>> intermittently caused NMT accounting to become corrupted, triggering various 
>> assertions when future NMT operations are done.  To fix this, I've adopted 
>> [Thomas' 
>> suggestion](https://github.com/openjdk/jdk/pull/21928#issuecomment-2460238057)
>>  to reverse the order of 
>> 
>> 
>> thread->register_thread_stack_with_NMT();
>> thread->initialize_thread_current();
>> 
>> 
>> in `attach_current_thread`.  This allows `NmtVirtualMemoryLocker` to be 
>> locked after current thread is set. 
>> 
>> To allow for `NmtVirtualMemoryLocker` to be used during VM init, I've 
>> replaced the `ConditionalMutexLocker` check `Thread::current_or_null_safe() 
>> != nullptr` with a new flag: `_done_bootstrap`. This flag prevents the lock 
>> from being used during VM init, at which point we are single threaded 
>> anyway. This avoids errors due to Hotspot mutexes and current thread not yet 
>> being ready. 
>> 
>> I also added new asserts in `virtualMemoryTracker.cpp` to catch future bugs 
>> like this where the lock is not held when it should be. I updated the 
>> appropriate VMT tests to also lock (there were a few cases where locking was 
>> being bypassed) so they can pass the new asserts.
>> 
>> I also removed the unused `_query_lock` variable in `MemTracker`.
>> 
>> Testing: 
>> 
>> - On Linux amd64, I was able to consistently reproduce the errors described 
>> in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244) by increasing 
>> the number of test threads in 
>> `java/lang/Thread/jni/AttachCurrentThread/AttachTest.java`. The test 
>> consistently passes with the new changes in this PR.
>> - hotspot_nmt , gtest:VirtualSpace, tier1
>
> Robert Toyonaga has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   move boostrapping_done flag into Mutex from MutexLocker

Changes requested by kbarrett (Reviewer).

src/hotspot/share/nmt/threadStackTracker.cpp line 55:

> 53:   align_thread_stack_boundaries_inward(base, size);
> 54: 
> 55:   MemTracker::MemTracker::NmtVirtualMemoryLocker nvml;

Too many "MemTracker"s here?

src/hotspot/share/nmt/virtualMemoryTracker.cpp line 632:

> 630:     if (Mutex::is_bootstrapping_done()) {
> 631:       assert_lock_strong(NmtVirtualMemory_lock);
> 632:     }

It seems like this can use `MemTracker::assert_locked()`, since that is 
conditioned on bootstrapping state.

src/hotspot/share/runtime/mutex.hpp line 222:

> 220:   }
> 221:  private:
> 222:   static bool _bootstrapping_done;

A description of what these mean would be nice. It seems to be mutex_init 
completed + threading
initialization completed?

src/hotspot/share/runtime/mutexLocker.cpp line 291:

> 289:   MUTEX_DEFN(NMTQuery_lock                   , PaddedMutex  , safepoint);
> 290:   MUTEX_DEFN(NMTCompilationCostHistory_lock  , PaddedMutex  , 
> nosafepoint);
> 291:   MUTEX_DEFN(NmtVirtualMemory_lock           , PaddedMutex  , 
> service-4); // Must be lower than G1Mapper_lock used from 
> G1RegionsSmallerThanCommitSizeMapper::commit_regions

Is it really okay for nmt to use a VM mutex? That means relevant functions can
only be called from a VM thread. In particular, what happens if we get into
the error handler because of a signal on a non-VM thread, or something
similarly weird.

This is also replacing the PlatformMutex that was being used by
MemoryFileTracker. I don't know why that was using a PlatformMutex, but I
presume there was a reason. It was doing so in the initial version of
JDK-8312132. Looking back through the development commits, is see these two in
sequence:

3472573 Introduce a separate Mutex for MemoryFileTracker
46f10f6 Use own PlatformMutex instead

I found no discussion about that choice in the PR. Maybe it was because of
problems determining the needed lock rank? (The original VM mutex had
Mutex::service rank. That's apparently too high, as discussed in this change.)
Or maybe there was some other reason?

I'll ask @jdksjolen about this, though holidays...

-------------

PR Review: https://git.openjdk.org/jdk/pull/22745#pullrequestreview-2518602318
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894576359
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894576955
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894652359
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894657610

Reply via email to