On Fri, 20 Dec 2024 21:29:18 GMT, Robert Toyonaga <d...@openjdk.org> wrote:
>> This is a redo of [JDK-8304824](https://bugs.openjdk.org/browse/JDK-8304824) >> which was backed out by >> [JDK-8343726](https://bugs.openjdk.org/browse/JDK-8343726) due to problems >> documented in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244). >> >> The problem was that `NmtVirtualMemoryLocker` was not locking when the >> current thread is not attached by checking `Thread::current_or_null_safe() >> != nullptr`. This is necessary during VM init, but should not be allowed >> afterward. NMT may be used in `attach_current_thread` before the current >> thread is set. The lock was not being acquired in that case, which >> intermittently caused NMT accounting to become corrupted, triggering various >> assertions when future NMT operations are done. To fix this, I've adopted >> [Thomas' >> suggestion](https://github.com/openjdk/jdk/pull/21928#issuecomment-2460238057) >> to reverse the order of >> >> >> thread->register_thread_stack_with_NMT(); >> thread->initialize_thread_current(); >> >> >> in `attach_current_thread`. This allows `NmtVirtualMemoryLocker` to be >> locked after current thread is set. >> >> To allow for `NmtVirtualMemoryLocker` to be used during VM init, I've >> replaced the `ConditionalMutexLocker` check `Thread::current_or_null_safe() >> != nullptr` with a new flag: `_done_bootstrap`. This flag prevents the lock >> from being used during VM init, at which point we are single threaded >> anyway. This avoids errors due to Hotspot mutexes and current thread not yet >> being ready. >> >> I also added new asserts in `virtualMemoryTracker.cpp` to catch future bugs >> like this where the lock is not held when it should be. I updated the >> appropriate VMT tests to also lock (there were a few cases where locking was >> being bypassed) so they can pass the new asserts. >> >> I also removed the unused `_query_lock` variable in `MemTracker`. >> >> Testing: >> >> - On Linux amd64, I was able to consistently reproduce the errors described >> in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244) by increasing >> the number of test threads in >> `java/lang/Thread/jni/AttachCurrentThread/AttachTest.java`. The test >> consistently passes with the new changes in this PR. >> - hotspot_nmt , gtest:VirtualSpace, tier1 > > Robert Toyonaga has updated the pull request incrementally with one > additional commit since the last revision: > > move boostrapping_done flag into Mutex from MutexLocker Changes requested by kbarrett (Reviewer). src/hotspot/share/nmt/threadStackTracker.cpp line 55: > 53: align_thread_stack_boundaries_inward(base, size); > 54: > 55: MemTracker::MemTracker::NmtVirtualMemoryLocker nvml; Too many "MemTracker"s here? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 632: > 630: if (Mutex::is_bootstrapping_done()) { > 631: assert_lock_strong(NmtVirtualMemory_lock); > 632: } It seems like this can use `MemTracker::assert_locked()`, since that is conditioned on bootstrapping state. src/hotspot/share/runtime/mutex.hpp line 222: > 220: } > 221: private: > 222: static bool _bootstrapping_done; A description of what these mean would be nice. It seems to be mutex_init completed + threading initialization completed? src/hotspot/share/runtime/mutexLocker.cpp line 291: > 289: MUTEX_DEFN(NMTQuery_lock , PaddedMutex , safepoint); > 290: MUTEX_DEFN(NMTCompilationCostHistory_lock , PaddedMutex , > nosafepoint); > 291: MUTEX_DEFN(NmtVirtualMemory_lock , PaddedMutex , > service-4); // Must be lower than G1Mapper_lock used from > G1RegionsSmallerThanCommitSizeMapper::commit_regions Is it really okay for nmt to use a VM mutex? That means relevant functions can only be called from a VM thread. In particular, what happens if we get into the error handler because of a signal on a non-VM thread, or something similarly weird. This is also replacing the PlatformMutex that was being used by MemoryFileTracker. I don't know why that was using a PlatformMutex, but I presume there was a reason. It was doing so in the initial version of JDK-8312132. Looking back through the development commits, is see these two in sequence: 3472573 Introduce a separate Mutex for MemoryFileTracker 46f10f6 Use own PlatformMutex instead I found no discussion about that choice in the PR. Maybe it was because of problems determining the needed lock rank? (The original VM mutex had Mutex::service rank. That's apparently too high, as discussed in this change.) Or maybe there was some other reason? I'll ask @jdksjolen about this, though holidays... ------------- PR Review: https://git.openjdk.org/jdk/pull/22745#pullrequestreview-2518602318 PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894576359 PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894576955 PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894652359 PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894657610