On Mon, 24 Mar 2025 16:16:34 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:

>> ### Summary:
>> This PR makes memory operations atomic with NMT accounting.
>> 
>> ### The problem:
>> In memory related functions like `os::commit_memory` and 
>> `os::reserve_memory` the OS memory operations are currently done before 
>> acquiring the the NMT mutex. And the the virtual memory accounting is done 
>> later in `MemTracker`, after the lock has been acquired. Doing the memory 
>> operations outside of the lock scope can lead to races. 
>> 
>> 1.1 Thread_1 releases range_A.
>> 1.2 Thread_1 tells NMT "range_A has been released".
>> 
>> 2.1 Thread_2 reserves (the now free) range_A.
>> 2.2 Thread_2 tells NMT "range_A is reserved".
>> 
>> Since the sequence (1.1)  (1.2) is not atomic, if Thread_2 begins operating 
>> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid 
>> subsequent calls (release range_A, followed by map range_A). But NMT sees 
>> "reserve range_A", "release range_A" and is now out of sync with the OS.
>> 
>> ### Solution:
>> Where memory operations such as reserve, commit, or release virtual memory 
>> happen,  I've expanded the scope of `NmtVirtualMemoryLocker` to protect both 
>> the NMT accounting and the memory operation itself. 
>> 
>> ### Other notes:
>> I also simplified this  pattern found in many places:
>> 
>> if (MemTracker::enabled()) {
>>   MemTracker::NmtVirtualMemoryLocker nvml;
>>   result = pd_some_operation(addr, bytes);
>>   if (result != nullptr) {
>>     MemTracker::record_some_operation(addr, bytes);
>>   }
>> } else {
>>   result = pd_unmap_memory(addr, bytes);
>> }
>> ``` 
>> To:
>> 
>> MemTracker::NmtVirtualMemoryLocker nvml;
>> result = pd_unmap_memory(addr, bytes);
>> MemTracker::record_some_operation(addr, bytes);
>> ``` 
>> This is possible because `NmtVirtualMemoryLocker` now checks 
>> `MemTracker::enabled()`.  `MemTracker::record_some_operation` already checks 
>> `MemTracker::enabled()` and checks against nullptr.  This refactoring 
>> previously wasn't possible because `ThreadCritical` was used before 
>> https://github.com/openjdk/jdk/pull/22745 introduced 
>> `NmtVirtualMemoryLocker`. 
>> 
>> I considered moving the locking and NMT accounting down into platform 
>> specific code:  Ex. lock around { munmap() + MemTracker::record }. The hope 
>> was that this would help reduce the size of the critical section. However, I 
>> found that the OS-specific "pd_" functions are already short and 
>> to-the-point, so doing this wasn't reducing the lock scope very much. 
>> Instead it just makes the code more messy by having to maintain the locking 
>> and NMT accounting in each platform specific i...
>
> I don't see why we need to extend the lock to be held over the 
> reserve/commit/alloc part.
> 
> If we only extend the lock on the release side, then it looks like we get the 
> required exclusion:
> 
> lock
> 1.1 Thread_1 releases range_A.
> 1.2 Thread_1 tells NMT "range_A has been released".
> unlock
> 
> 2.1 Thread_2 reserves (the now free) range_A.
> lock
> 2.2 Thread_2 tells NMT "range_A is reserved".
> unlock
> 
> We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) 
> (2.2) (1.2).
> 
> And isn't this locking scheme exactly what the current code is using? Have 
> you seen an issue that this proposed PR intends to solve? If there is such a 
> problem I wonder if there's just a missing lock extension in one of the 
> "release" operations.

Hi @stefank, I think you're right about (1.1) (2.1) (2.2) (1.2) being prevented 
by the current implementation. Is there a reason that the current 
implementation only does the wider locking for release/uncommit? Maybe (2.1) 
(1.1) (1.2) (2.2) isn't much of an issue since it's unlikely that another 
thread would uncommit/release the same base address shortly after it's 
committed/reserved? 

>Have you seen an issue that this proposed PR intends to solve? If there is 
>such a problem I wonder if there's just a missing lock extension in one of the 
>"release" operations.

I haven't seen that race in the wild, I just noticed that the memory operations 
weren't protected and thought that it could be a problem.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748848175

Reply via email to