On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga <[email protected]> wrote:
> ### Summary:
> This PR makes memory operations atomic with NMT accounting.
>
> ### The problem:
> In memory related functions like `os::commit_memory` and `os::reserve_memory`
> the OS memory operations are currently done before acquiring the the NMT
> mutex. And the the virtual memory accounting is done later in `MemTracker`,
> after the lock has been acquired. Doing the memory operations outside of the
> lock scope can lead to races.
>
> 1.1 Thread_1 releases range_A.
> 1.2 Thread_1 tells NMT "range_A has been released".
>
> 2.1 Thread_2 reserves (the now free) range_A.
> 2.2 Thread_2 tells NMT "range_A is reserved".
>
> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating
> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid
> subsequent calls (release range_A, followed by map range_A). But NMT sees
> "reserve range_A", "release range_A" and is now out of sync with the OS.
>
> ### Solution:
> Where memory operations such as reserve, commit, or release virtual memory
> happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both
> the NMT accounting and the memory operation itself.
>
> ### Other notes:
> I also simplified this pattern found in many places:
>
> if (MemTracker::enabled()) {
> MemTracker::NmtVirtualMemoryLocker nvml;
> result = pd_some_operation(addr, bytes);
> if (result != nullptr) {
> MemTracker::record_some_operation(addr, bytes);
> }
> } else {
> result = pd_unmap_memory(addr, bytes);
> }
> ```
> To:
>
> MemTracker::NmtVirtualMemoryLocker nvml;
> result = pd_unmap_memory(addr, bytes);
> MemTracker::record_some_operation(addr, bytes);
> ```
> This is possible because `NmtVirtualMemoryLocker` now checks
> `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks
> `MemTracker::enabled()` and checks against nullptr. This refactoring
> previously wasn't possible because `ThreadCritical` was used before
> https://github.com/openjdk/jdk/pull/22745 introduced
> `NmtVirtualMemoryLocker`.
>
> I considered moving the locking and NMT accounting down into platform
> specific code: Ex. lock around { munmap() + MemTracker::record }. The hope
> was that this would help reduce the size of the critical section. However, I
> found that the OS-specific "pd_" functions are already short and
> to-the-point, so doing this wasn't reducing the lock scope very much. Instead
> it just makes the code more messy by having to maintain the locking and NMT
> accounting in each platform specific implementation.
>
> In many places I've done minor refactoring by relocating call...
OK should I update this PR to do the following things:
- Add comments explaining the asymmetrical locking and warning against
patterns that lead to races
- swapping the order of `NmtVirtualMemoryLocker` and release/uncommit
- Fail fatally if release/uncommit does not complete.
Or does it make more sense to do that in a different issue/PR?
Also, do we want to keep the new tests and the refactorings (see below)?
if (MemTracker::enabled()) {
MemTracker::NmtVirtualMemoryLocker nvml;
result = pd_some_operation(addr, bytes);
if (result != nullptr) {
MemTracker::record_some_operation(addr, bytes);
}
} else {
result = pd_unmap_memory(addr, bytes);
}
To:
MemTracker::NmtVirtualMemoryLocker nvml;
result = pd_unmap_memory(addr, bytes);
MemTracker::record_some_operation(addr, bytes);
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2766365411