On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga <d...@openjdk.org> wrote:
> ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` > the OS memory operations are currently done before acquiring the the NMT > mutex. And the the virtual memory accounting is done later in `MemTracker`, > after the lock has been acquired. Doing the memory operations outside of the > lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating > after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid > subsequent calls (release range_A, followed by map range_A). But NMT sees > "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory > happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both > the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks > `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks > `MemTracker::enabled()` and checks against nullptr. This refactoring > previously wasn't possible because `ThreadCritical` was used before > https://github.com/openjdk/jdk/pull/22745 introduced > `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform > specific code: Ex. lock around { munmap() + MemTracker::record }. The hope > was that this would help reduce the size of the critical section. However, I > found that the OS-specific "pd_" functions are already short and > to-the-point, so doing this wasn't reducing the lock scope very much. Instead > it just makes the code more messy by having to maintain the locking and NMT > accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... OK should I update this PR to do the following things: - Add comments explaining the asymmetrical locking and warning against patterns that lead to races - swapping the order of `NmtVirtualMemoryLocker` and release/uncommit - Fail fatally if release/uncommit does not complete. Or does it make more sense to do that in a different issue/PR? Also, do we want to keep the new tests and the refactorings (see below)? if (MemTracker::enabled()) { MemTracker::NmtVirtualMemoryLocker nvml; result = pd_some_operation(addr, bytes); if (result != nullptr) { MemTracker::record_some_operation(addr, bytes); } } else { result = pd_unmap_memory(addr, bytes); } To: MemTracker::NmtVirtualMemoryLocker nvml; result = pd_unmap_memory(addr, bytes); MemTracker::record_some_operation(addr, bytes); ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2766365411