On Thu, 19 Dec 2024 21:04:51 GMT, Robert Toyonaga <d...@openjdk.org> wrote:

>> This is a redo of [JDK-8304824](https://bugs.openjdk.org/browse/JDK-8304824) 
>> which was backed out by 
>> [JDK-8343726](https://bugs.openjdk.org/browse/JDK-8343726) due to problems 
>> documented in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244).
>> 
>> The problem was that `NmtVirtualMemoryLocker` was not locking when the 
>> current thread is not attached by checking `Thread::current_or_null_safe() 
>> != nullptr`. This is necessary during VM init, but should not be allowed 
>> afterward. NMT may be used in `attach_current_thread` before the current 
>> thread is set. The lock was not being acquired in that case, which 
>> intermittently caused NMT accounting to become corrupted, triggering various 
>> assertions when future NMT operations are done.  To fix this, I've adopted 
>> [Thomas' 
>> suggestion](https://github.com/openjdk/jdk/pull/21928#issuecomment-2460238057)
>>  to reverse the order of 
>> 
>> 
>> thread->register_thread_stack_with_NMT();
>> thread->initialize_thread_current();
>> 
>> 
>> in `attach_current_thread`.  This allows `NmtVirtualMemoryLocker` to be 
>> locked after current thread is set. 
>> 
>> To allow for `NmtVirtualMemoryLocker` to be used during VM init, I've 
>> replaced the `ConditionalMutexLocker` check `Thread::current_or_null_safe() 
>> != nullptr` with a new flag: `_done_bootstrap`. This flag prevents the lock 
>> from being used during VM init, at which point we are single threaded 
>> anyway. This avoids errors due to Hotspot mutexes and current thread not yet 
>> being ready. 
>> 
>> I also added new asserts in `virtualMemoryTracker.cpp` to catch future bugs 
>> like this where the lock is not held when it should be. I updated the 
>> appropriate VMT tests to also lock (there were a few cases where locking was 
>> being bypassed) so they can pass the new asserts.
>> 
>> I also removed the unused `_query_lock` variable in `MemTracker`.
>> 
>> Testing: 
>> 
>> - On Linux amd64, I was able to consistently reproduce the errors described 
>> in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244) by increasing 
>> the number of test threads in 
>> `java/lang/Thread/jni/AttachCurrentThread/AttachTest.java`. The test 
>> consistently passes with the new changes in this PR.
>> - hotspot_nmt , gtest:VirtualSpace, tier1
>
> Robert Toyonaga has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Revert to using NmtVirtualMemoryLocker. Use defaultStream. Add comment for 
> SharedDecoder_lock

@roberttoyonaga thank you for taking this on again.

Could you please add a comment somewhere that describes the locking problem we 
try to solve in all its ugliness? I keep forgetting the details myself.

Something along these lines:

We need locking in NMT for mmap-based allocated (malloc is lockless). Because 
mmap-based allocations use global data structures. But we also need those lock 
sections to include the OS-side mmap/munmap calls that caused modifications of 
said sections. So, `mmap` calls must happen together with their associated 
`MemTracker::record_reserve`, same for `munmap` and 
`MemTracker::record_release`. Without locking, NMT may see states that are 
invalid. For example this order: `T1 calls munmap; T2 calls mmap (same region); 
T2 calls MemTracker::record_reserve; T1  calls MemTracker::record_release;` 
would result in NMT seeing a `MemTracker::record_reserve` for a region that, in 
its opinion, is still occupied.

So we lock, and we want to use Mutex. This has two problems: a) does not work 
before mutexes have been initialized. b) does not work if Thread::current is 
NULL. 

(a) is a problem for early mmaps. (Example: polling page setup ?)
(b) is a problem if a thread tries to use mmap but has not yet set 
Thread::current. This was stack region registration with NMT (because we misuse 
NMT mmap management for registering stack threads).

src/hotspot/os/windows/os_windows.cpp line 3626:

> 3624:       os::print_memory_mappings((char*)start, bytes, &fs);
> 3625:       assert(false, "bad release: [" PTR_FORMAT "-" PTR_FORMAT "): %s", 
> p2i(start), p2i(end), err);
> 3626: #endif

@roberttoyonaga Thinking about this, I propose to just remove the 
os::print_memory_mappings call here. I added this way back when we had problems 
on Windows with removing "striped" NUMA mappings, but those issues have long 
been solved. We still have the assertion check, so we will notice if something 
goes wrong.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22745#pullrequestreview-2513685220
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894197615

Reply via email to