On Wed, 2 Apr 2025 14:02:25 GMT, Robert Toyonaga <[email protected]> wrote:
>> ### Summary:
>> This PR makes memory operations atomic with NMT accounting.
>>
>> ### The problem:
>> In memory related functions like `os::commit_memory` and
>> `os::reserve_memory` the OS memory operations are currently done before
>> acquiring the the NMT mutex. And the the virtual memory accounting is done
>> later in `MemTracker`, after the lock has been acquired. Doing the memory
>> operations outside of the lock scope can lead to races.
>>
>> 1.1 Thread_1 releases range_A.
>> 1.2 Thread_1 tells NMT "range_A has been released".
>>
>> 2.1 Thread_2 reserves (the now free) range_A.
>> 2.2 Thread_2 tells NMT "range_A is reserved".
>>
>> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating
>> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid
>> subsequent calls (release range_A, followed by map range_A). But NMT sees
>> "reserve range_A", "release range_A" and is now out of sync with the OS.
>>
>> ### Solution:
>> Where memory operations such as reserve, commit, or release virtual memory
>> happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both
>> the NMT accounting and the memory operation itself.
>>
>> ### Other notes:
>> I also simplified this pattern found in many places:
>>
>> if (MemTracker::enabled()) {
>> MemTracker::NmtVirtualMemoryLocker nvml;
>> result = pd_some_operation(addr, bytes);
>> if (result != nullptr) {
>> MemTracker::record_some_operation(addr, bytes);
>> }
>> } else {
>> result = pd_unmap_memory(addr, bytes);
>> }
>> ```
>> To:
>>
>> MemTracker::NmtVirtualMemoryLocker nvml;
>> result = pd_unmap_memory(addr, bytes);
>> MemTracker::record_some_operation(addr, bytes);
>> ```
>> This is possible because `NmtVirtualMemoryLocker` now checks
>> `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks
>> `MemTracker::enabled()` and checks against nullptr. This refactoring
>> previously wasn't possible because `ThreadCritical` was used before
>> https://github.com/openjdk/jdk/pull/22745 introduced
>> `NmtVirtualMemoryLocker`.
>>
>> I considered moving the locking and NMT accounting down into platform
>> specific code: Ex. lock around { munmap() + MemTracker::record }. The hope
>> was that this would help reduce the size of the critical section. However, I
>> found that the OS-specific "pd_" functions are already short and
>> to-the-point, so doing this wasn't reducing the lock scope very much.
>> Instead it just makes the code more messy by having to maintain the locking
>> and NMT accounting in each platform specific i...
>
> Robert Toyonaga has updated the pull request incrementally with two
> additional commits since the last revision:
>
> - tests and comments
> - Revert "make memory op and NMT accounting atomic"
>
> This reverts commit 86423d0b7e8e2b0b313a686a64c803028a5f2420.
Tonight we tested this PR on AIX and it failed in the gtest with
Internal Error (os_aix.cpp:1917), pid=26476938, tid=258
Error: guarantee((vmi)) failed
This will happen if a
`os::pd_commit_memory()` or `os::pd_release_memory()` or
`os::pd_uncommit_memory()`
is called on memory not allocated with
`os::pd_reserve_memory()` or `os::pd_attempt_map_memory_to_file_at()` or
`os::pd_attempt_reserve_memory_at()`
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2775248289