Re: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock

2025-03-24 Thread Stefan Karlsson
On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga  wrote:

> ### Summary:
> This PR makes memory operations atomic with NMT accounting.
> 
> ### The problem:
> In memory related functions like `os::commit_memory` and `os::reserve_memory` 
> the OS memory operations are currently done before acquiring the the NMT 
> mutex. And the the virtual memory accounting is done later in `MemTracker`, 
> after the lock has been acquired. Doing the memory operations outside of the 
> lock scope can lead to races. 
> 
> 1.1 Thread_1 releases range_A.
> 1.2 Thread_1 tells NMT "range_A has been released".
> 
> 2.1 Thread_2 reserves (the now free) range_A.
> 2.2 Thread_2 tells NMT "range_A is reserved".
> 
> Since the sequence (1.1)  (1.2) is not atomic, if Thread_2 begins operating 
> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid 
> subsequent calls (release range_A, followed by map range_A). But NMT sees 
> "reserve range_A", "release range_A" and is now out of sync with the OS.
> 
> ### Solution:
> Where memory operations such as reserve, commit, or release virtual memory 
> happen,  I've expanded the scope of `NmtVirtualMemoryLocker` to protect both 
> the NMT accounting and the memory operation itself. 
> 
> ### Other notes:
> I also simplified this  pattern found in many places:
> 
> if (MemTracker::enabled()) {
>   MemTracker::NmtVirtualMemoryLocker nvml;
>   result = pd_some_operation(addr, bytes);
>   if (result != nullptr) {
> MemTracker::record_some_operation(addr, bytes);
>   }
> } else {
>   result = pd_unmap_memory(addr, bytes);
> }
> ``` 
> To:
> 
> MemTracker::NmtVirtualMemoryLocker nvml;
> result = pd_unmap_memory(addr, bytes);
> MemTracker::record_some_operation(addr, bytes);
> ``` 
> This is possible because `NmtVirtualMemoryLocker` now checks 
> `MemTracker::enabled()`.  `MemTracker::record_some_operation` already checks 
> `MemTracker::enabled()` and checks against nullptr.  This refactoring 
> previously wasn't possible because `ThreadCritical` was used before 
> https://github.com/openjdk/jdk/pull/22745 introduced 
> `NmtVirtualMemoryLocker`. 
> 
> I considered moving the locking and NMT accounting down into platform 
> specific code:  Ex. lock around { munmap() + MemTracker::record }. The hope 
> was that this would help reduce the size of the critical section. However, I 
> found that the OS-specific "pd_" functions are already short and 
> to-the-point, so doing this wasn't reducing the lock scope very much. Instead 
> it just makes the code more messy by having to maintain the locking and NMT 
> accounting in each platform specific implementation. 
> 
> In many places I've done minor refactoring by relocating call...

I don't see why we need to extend the lock to be held over the 
reserve/commit/alloc part.

If we only extend the lock on the release side, then it looks like we get the 
required exclusion:

lock
1.1 Thread_1 releases range_A.
1.2 Thread_1 tells NMT "range_A has been released".
unlock

2.1 Thread_2 reserves (the now free) range_A.
lock
2.2 Thread_2 tells NMT "range_A is reserved".
unlock

We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) 
(1.2).

And isn't this locking scheme exactly what the current code is using? Have you 
seen an issue that this proposed PR intends to solve? If there is such a 
problem I wonder if there's just a missing lock extension in one of the 
"release" operations.

-

Changes requested by stefank (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24084#pullrequestreview-2710963414


Re: RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently [v2]

2025-03-24 Thread Leonid Mesnik
On Mon, 24 Mar 2025 15:08:01 GMT, Kevin Walls  wrote:

>> These tests have always silently permitted a -1 return value from 
>> OperatingSystemMXBean CPU time methods.
>> 
>> They need to be stricter, but occasionally Windows 2019 returns a -1 for the 
>> first few calls of these methods.  This seems to be a Windows 2019 bug or 
>> peculiarity.  Other Windows versions are not affected.
>> 
>> GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the 
>> CPU time calls continually return -1.  They should permit -1 values, as long 
>> as subsequently a value in the valid range is read.
>> 
>> The GetProcessCpuTime test also needs to retry enough times to expect no -1 
>> values, and not just skip.  While updating this test: it has a maximum 
>> expected value of Long.MAX_VALUE, which it may as well reduce to something 
>> that does not look like a binary "all ones except for the high bit" value 
>> (without creating an ongoing game where we keep increasing the value to 
>> avoid failures in slow runs).
>
> Kevin Walls has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - whitespace
>  - whitespace

Changes requested by lmesnik (Reviewer).

test/jdk/com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java line 
45:

> 43: for(int i = 0; i < 10; i++) {
> 44: load = mbean.getProcessCpuLoad();
> 45: if (load == -1.0) {

Please harden test to allow -1.0 on windows only.

-

PR Review: https://git.openjdk.org/jdk/pull/24186#pullrequestreview-2711575803
PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010862043


Re: RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently

2025-03-24 Thread Andrey Turbanov
On Mon, 24 Mar 2025 10:13:34 GMT, Kevin Walls  wrote:

> These tests have always silently permitted a -1 return value from 
> OperatingSystemMXBean CPU time methods.
> 
> They need to be stricter, but occasionally Windows 2019 returns a -1 for the 
> first few calls of these methods.  This seems to be a Windows 2019 bug or 
> peculiarity.  Other Windows versions are not affected.
> 
> GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the CPU 
> time calls continually return -1.  They should permit -1 values, as long as 
> subsequently a value in the valid range is read.
> 
> The GetProcessCpuTime test also needs to retry enough times to expect no -1 
> values, and not just skip.  While updating this test: it has a maximum 
> expected value of Long.MAX_VALUE, which it may as well reduce to something 
> that does not look like a binary "all ones except for the high bit" value 
> (without creating an ongoing game where we keep increasing the value to avoid 
> failures in slow runs).

test/jdk/com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java line 
43:

> 41: double load;
> 42: 
> 43: for(int i = 0; i < 10; i++) {

Suggestion:

for (int i = 0; i < 10; i++) {

test/jdk/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java line 43:

> 41: double load;
> 42: 
> 43: for(int i = 0; i < 10; i++) {

Suggestion:

for (int i = 0; i < 10; i++) {

test/jdk/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java line 49:

> 47: // Remember a -1 in case it never gets better.
> 48: ex = new RuntimeException("getSystemCpuLoad() returns " + 
> load
> 49:  +  " which is not in the [0.0,1.0] interval");

Suggestion:

 + " which is not in the [0.0,1.0] interval");

test/jdk/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java line 53:

> 51: } else if (load < 0.0 || load > 1.0) {
> 52: throw new RuntimeException("getSystemCpuLoad() returns " 
> + load
> 53:+  " which is not in the [0.0,1.0] interval");

Suggestion:

   + " which is not in the [0.0,1.0] interval");

-

PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010344429
PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010344971
PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010345618
PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010346257


Re: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock

2025-03-24 Thread Thomas Stuefe
On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga  wrote:

> ### Summary:
> This PR makes memory operations atomic with NMT accounting.
> 
> ### The problem:
> In memory related functions like `os::commit_memory` and `os::reserve_memory` 
> the OS memory operations are currently done before acquiring the the NMT 
> mutex. And the the virtual memory accounting is done later in `MemTracker`, 
> after the lock has been acquired. Doing the memory operations outside of the 
> lock scope can lead to races. 
> 
> 1.1 Thread_1 releases range_A.
> 1.2 Thread_1 tells NMT "range_A has been released".
> 
> 2.1 Thread_2 reserves (the now free) range_A.
> 2.2 Thread_2 tells NMT "range_A is reserved".
> 
> Since the sequence (1.1)  (1.2) is not atomic, if Thread_2 begins operating 
> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid 
> subsequent calls (release range_A, followed by map range_A). But NMT sees 
> "reserve range_A", "release range_A" and is now out of sync with the OS.
> 
> ### Solution:
> Where memory operations such as reserve, commit, or release virtual memory 
> happen,  I've expanded the scope of `NmtVirtualMemoryLocker` to protect both 
> the NMT accounting and the memory operation itself. 
> 
> ### Other notes:
> I also simplified this  pattern found in many places:
> 
> if (MemTracker::enabled()) {
>   MemTracker::NmtVirtualMemoryLocker nvml;
>   result = pd_some_operation(addr, bytes);
>   if (result != nullptr) {
> MemTracker::record_some_operation(addr, bytes);
>   }
> } else {
>   result = pd_unmap_memory(addr, bytes);
> }
> ``` 
> To:
> 
> MemTracker::NmtVirtualMemoryLocker nvml;
> result = pd_unmap_memory(addr, bytes);
> MemTracker::record_some_operation(addr, bytes);
> ``` 
> This is possible because `NmtVirtualMemoryLocker` now checks 
> `MemTracker::enabled()`.  `MemTracker::record_some_operation` already checks 
> `MemTracker::enabled()` and checks against nullptr.  This refactoring 
> previously wasn't possible because `ThreadCritical` was used before 
> https://github.com/openjdk/jdk/pull/22745 introduced 
> `NmtVirtualMemoryLocker`. 
> 
> I considered moving the locking and NMT accounting down into platform 
> specific code:  Ex. lock around { munmap() + MemTracker::record }. The hope 
> was that this would help reduce the size of the critical section. However, I 
> found that the OS-specific "pd_" functions are already short and 
> to-the-point, so doing this wasn't reducing the lock scope very much. Instead 
> it just makes the code more messy by having to maintain the locking and NMT 
> accounting in each platform specific implementation. 
> 
> In many places I've done minor refactoring by relocating call...

Ping @JoKern65 for AIX

-

PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748477402


RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock

2025-03-24 Thread Robert Toyonaga
### Summary:
This PR makes memory operations atomic with NMT accounting.

### The problem:
In memory related functions like `os::commit_memory` and `os::reserve_memory` 
the OS memory operations are currently done before acquiring the the NMT mutex. 
And the the virtual memory accounting is done later in `MemTracker`, after the 
lock has been acquired. Doing the memory operations outside of the lock scope 
can lead to races. 

1.1 Thread_1 releases range_A.
1.2 Thread_1 tells NMT "range_A has been released".

2.1 Thread_2 reserves (the now free) range_A.
2.2 Thread_2 tells NMT "range_A is reserved".

Since the sequence (1.1)  (1.2) is not atomic, if Thread_2 begins operating 
after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid 
subsequent calls (release range_A, followed by map range_A). But NMT sees 
"reserve range_A", "release range_A" and is now out of sync with the OS.

### Solution:
Where memory operations such as reserve, commit, or release virtual memory 
happen,  I've expanded the scope of `NmtVirtualMemoryLocker` to protect both 
the NMT accounting and the memory operation itself. 

### Other notes:
I also simplified this  pattern found in many places:

if (MemTracker::enabled()) {
  MemTracker::NmtVirtualMemoryLocker nvml;
  result = pd_some_operation(addr, bytes);
  if (result != nullptr) {
MemTracker::record_some_operation(addr, bytes);
  }
} else {
  result = pd_unmap_memory(addr, bytes);
}
``` 
To:

MemTracker::NmtVirtualMemoryLocker nvml;
result = pd_unmap_memory(addr, bytes);
MemTracker::record_some_operation(addr, bytes);
``` 
This is possible because `NmtVirtualMemoryLocker` now checks 
`MemTracker::enabled()`.  `MemTracker::record_some_operation` already checks 
`MemTracker::enabled()` and checks against nullptr.  This refactoring 
previously wasn't possible because `ThreadCritical` was used before 
https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. 

I considered moving the locking and NMT accounting down into platform specific 
code:  Ex. lock around { munmap() + MemTracker::record }. The hope was that 
this would help reduce the size of the critical section. However, I found that 
the OS-specific "pd_" functions are already short and to-the-point, so doing 
this wasn't reducing the lock scope very much. Instead it just makes the code 
more messy by having to maintain the locking and NMT accounting in each 
platform specific implementation. 

In many places I've done minor refactoring by relocating calls to `MemTracker` 
in order to tighten the locking scope.

In some OS specific code (such as `os::map_memory_to_file`), I've replaced 
calls to `os::release_memory` with `os::pd_release_memory`. This is to avoid 
`NmtVirtualMemoryLocker` reentrancy.

In a few places (such as `VirtualMemoryTracker::add_reserved_region`) I have 
replaced `tty` with `defaultStream::output_stream()`. Otherwise 
`NmtVirtualMemory_lock` would be acquired out of rank order with  `tty_lock`. 

### Testing:
One concern, due to the expanded critical section, is reentrancy. 
`NmtVirtualMemoryLocker` is a HotSpot mutex and is not reentrant. I've added 
new tests in _test_os.cpp_ and _test_virtualMemoryTracker.cpp_ that try to 
exercise any usages of NMT that weren't already exercised by existing tests.

tier1 passes on linux and windows. I do not have an AIX machine to test on. Can 
someone please help run the tests on AIX?

-

Commit messages:
 - make memory op and NMT accounting atomic

Changes: https://git.openjdk.org/jdk/pull/24084/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8341491
  Stats: 291 lines in 12 files changed: 186 ins; 42 del; 63 mod
  Patch: https://git.openjdk.org/jdk/pull/24084.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084

PR: https://git.openjdk.org/jdk/pull/24084


Re: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock

2025-03-24 Thread Thomas Stuefe
On Mon, 24 Mar 2025 16:16:34 GMT, Stefan Karlsson  wrote:

>> ### Summary:
>> This PR makes memory operations atomic with NMT accounting.
>> 
>> ### The problem:
>> In memory related functions like `os::commit_memory` and 
>> `os::reserve_memory` the OS memory operations are currently done before 
>> acquiring the the NMT mutex. And the the virtual memory accounting is done 
>> later in `MemTracker`, after the lock has been acquired. Doing the memory 
>> operations outside of the lock scope can lead to races. 
>> 
>> 1.1 Thread_1 releases range_A.
>> 1.2 Thread_1 tells NMT "range_A has been released".
>> 
>> 2.1 Thread_2 reserves (the now free) range_A.
>> 2.2 Thread_2 tells NMT "range_A is reserved".
>> 
>> Since the sequence (1.1)  (1.2) is not atomic, if Thread_2 begins operating 
>> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid 
>> subsequent calls (release range_A, followed by map range_A). But NMT sees 
>> "reserve range_A", "release range_A" and is now out of sync with the OS.
>> 
>> ### Solution:
>> Where memory operations such as reserve, commit, or release virtual memory 
>> happen,  I've expanded the scope of `NmtVirtualMemoryLocker` to protect both 
>> the NMT accounting and the memory operation itself. 
>> 
>> ### Other notes:
>> I also simplified this  pattern found in many places:
>> 
>> if (MemTracker::enabled()) {
>>   MemTracker::NmtVirtualMemoryLocker nvml;
>>   result = pd_some_operation(addr, bytes);
>>   if (result != nullptr) {
>> MemTracker::record_some_operation(addr, bytes);
>>   }
>> } else {
>>   result = pd_unmap_memory(addr, bytes);
>> }
>> ``` 
>> To:
>> 
>> MemTracker::NmtVirtualMemoryLocker nvml;
>> result = pd_unmap_memory(addr, bytes);
>> MemTracker::record_some_operation(addr, bytes);
>> ``` 
>> This is possible because `NmtVirtualMemoryLocker` now checks 
>> `MemTracker::enabled()`.  `MemTracker::record_some_operation` already checks 
>> `MemTracker::enabled()` and checks against nullptr.  This refactoring 
>> previously wasn't possible because `ThreadCritical` was used before 
>> https://github.com/openjdk/jdk/pull/22745 introduced 
>> `NmtVirtualMemoryLocker`. 
>> 
>> I considered moving the locking and NMT accounting down into platform 
>> specific code:  Ex. lock around { munmap() + MemTracker::record }. The hope 
>> was that this would help reduce the size of the critical section. However, I 
>> found that the OS-specific "pd_" functions are already short and 
>> to-the-point, so doing this wasn't reducing the lock scope very much. 
>> Instead it just makes the code more messy by having to maintain the locking 
>> and NMT accounting in each platform specific i...
>
> I don't see why we need to extend the lock to be held over the 
> reserve/commit/alloc part.
> 
> If we only extend the lock on the release side, then it looks like we get the 
> required exclusion:
> 
> lock
> 1.1 Thread_1 releases range_A.
> 1.2 Thread_1 tells NMT "range_A has been released".
> unlock
> 
> 2.1 Thread_2 reserves (the now free) range_A.
> lock
> 2.2 Thread_2 tells NMT "range_A is reserved".
> unlock
> 
> We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) 
> (2.2) (1.2).
> 
> And isn't this locking scheme exactly what the current code is using? Have 
> you seen an issue that this proposed PR intends to solve? If there is such a 
> problem I wonder if there's just a missing lock extension in one of the 
> "release" operations.

@stefank 

> And isn't this locking scheme exactly what the current code is using? Have 
> you seen an issue that this proposed PR intends to solve? If there is such a 
> problem I wonder if there's just a missing lock extension in one of the 
> "release" operations.

What about the case where one thread reserves a range and another thread 
releases it? 

1 Thread A reserves range
2 Thread B releases range
3 Thread B tells NMT "range released"
4 Thread A tells NMT "range reserved"

This would either result in an assert in NMT at step 3 when releasing a range 
NMT does not know. Or in an incorrectly booked range in step 4 without asserts. 
Am I making a thinking error somewhere?

-

PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748815516


Re: RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently [v2]

2025-03-24 Thread Kevin Walls
> These tests have always silently permitted a -1 return value from 
> OperatingSystemMXBean CPU time methods.
> 
> They need to be stricter, but occasionally Windows 2019 returns a -1 for the 
> first few calls of these methods.  This seems to be a Windows 2019 bug or 
> peculiarity.  Other Windows versions are not affected.
> 
> GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the CPU 
> time calls continually return -1.  They should permit -1 values, as long as 
> subsequently a value in the valid range is read.
> 
> The GetProcessCpuTime test also needs to retry enough times to expect no -1 
> values, and not just skip.  While updating this test: it has a maximum 
> expected value of Long.MAX_VALUE, which it may as well reduce to something 
> that does not look like a binary "all ones except for the high bit" value 
> (without creating an ongoing game where we keep increasing the value to avoid 
> failures in slow runs).

Kevin Walls has updated the pull request incrementally with two additional 
commits since the last revision:

 - whitespace
 - whitespace

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/24186/files
  - new: https://git.openjdk.org/jdk/pull/24186/files/21ebab5b..2cfd63ce

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=24186&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24186&range=00-01

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/24186.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24186/head:pull/24186

PR: https://git.openjdk.org/jdk/pull/24186


RFR: 8352392: AIX: implement attach API v2 and streaming output

2025-03-24 Thread Varada M
AIX changes for attach API to support arbitrary length arguments and the 
streaming output support. 
serviceability/attach/AttachAPIv2/StreamingOutputTest.java test passes

tier1, tier2 and tier3 testing is successful with fastdebug level

JBS Issue : [JDK-8352392](https://bugs.openjdk.org/browse/JDK-8352392)

-

Commit messages:
 - AIX: implement attach API v2 and streaming output
 - AIX: implement attach API v2 and streaming output
 - AIX: implement attach API v2 and streaming output
 - AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java

Changes: https://git.openjdk.org/jdk/pull/24177/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24177&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8352392
  Stats: 279 lines in 2 files changed: 63 ins; 201 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/24177.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24177/head:pull/24177

PR: https://git.openjdk.org/jdk/pull/24177


RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently

2025-03-24 Thread Kevin Walls
These tests have always silently permitted a -1 return value from 
OperatingSystemMXBean CPU time methods.

They need to be stricter, but occasionally Windows 2019 returns a -1 for the 
first few calls of these methods.  This seems to be a Windows 2019 bug or 
peculiarity.  Other Windows versions are not affected.

GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the CPU 
time calls continually return -1.  They should permit -1 values, as long as 
subsequently a value in the valid range is read.

The GetProcessCpuTime test also needs to retry enough times to expect no -1 
values, and not just skip.  While updating this test: it has a maximum expected 
value of Long.MAX_VALUE, which it may as well reduce to something that does not 
look like a binary "all ones except for the high bit" value (without creating 
an ongoing game where we keep increasing the value to avoid failures in slow 
runs).

-

Commit messages:
 - whitespace
 - problemlist
 - Merge remote-tracking branch 'upstream/master' into 
8351002_OSMXBean_CPU_Tests
 - 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail 
intermittently

Changes: https://git.openjdk.org/jdk/pull/24186/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24186&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8351002
  Stats: 59 lines in 4 files changed: 44 ins; 2 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/24186.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/24186/head:pull/24186

PR: https://git.openjdk.org/jdk/pull/24186