Re: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga wrote: > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` > the OS memory operations are currently done before acquiring the the NMT > mutex. And the the virtual memory accounting is done later in `MemTracker`, > after the lock has been acquired. Doing the memory operations outside of the > lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating > after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid > subsequent calls (release range_A, followed by map range_A). But NMT sees > "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory > happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both > the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks > `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks > `MemTracker::enabled()` and checks against nullptr. This refactoring > previously wasn't possible because `ThreadCritical` was used before > https://github.com/openjdk/jdk/pull/22745 introduced > `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform > specific code: Ex. lock around { munmap() + MemTracker::record }. The hope > was that this would help reduce the size of the critical section. However, I > found that the OS-specific "pd_" functions are already short and > to-the-point, so doing this wasn't reducing the lock scope very much. Instead > it just makes the code more messy by having to maintain the locking and NMT > accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... I don't see why we need to extend the lock to be held over the reserve/commit/alloc part. If we only extend the lock on the release side, then it looks like we get the required exclusion: lock 1.1 Thread_1 releases range_A. 1.2 Thread_1 tells NMT "range_A has been released". unlock 2.1 Thread_2 reserves (the now free) range_A. lock 2.2 Thread_2 tells NMT "range_A is reserved". unlock We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) (2.2) (1.2). And isn't this locking scheme exactly what the current code is using? Have you seen an issue that this proposed PR intends to solve? If there is such a problem I wonder if there's just a missing lock extension in one of the "release" operations. - Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24084#pullrequestreview-2710963414
Re: RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently [v2]
On Mon, 24 Mar 2025 15:08:01 GMT, Kevin Walls wrote: >> These tests have always silently permitted a -1 return value from >> OperatingSystemMXBean CPU time methods. >> >> They need to be stricter, but occasionally Windows 2019 returns a -1 for the >> first few calls of these methods. This seems to be a Windows 2019 bug or >> peculiarity. Other Windows versions are not affected. >> >> GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the >> CPU time calls continually return -1. They should permit -1 values, as long >> as subsequently a value in the valid range is read. >> >> The GetProcessCpuTime test also needs to retry enough times to expect no -1 >> values, and not just skip. While updating this test: it has a maximum >> expected value of Long.MAX_VALUE, which it may as well reduce to something >> that does not look like a binary "all ones except for the high bit" value >> (without creating an ongoing game where we keep increasing the value to >> avoid failures in slow runs). > > Kevin Walls has updated the pull request incrementally with two additional > commits since the last revision: > > - whitespace > - whitespace Changes requested by lmesnik (Reviewer). test/jdk/com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java line 45: > 43: for(int i = 0; i < 10; i++) { > 44: load = mbean.getProcessCpuLoad(); > 45: if (load == -1.0) { Please harden test to allow -1.0 on windows only. - PR Review: https://git.openjdk.org/jdk/pull/24186#pullrequestreview-2711575803 PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010862043
Re: RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently
On Mon, 24 Mar 2025 10:13:34 GMT, Kevin Walls wrote: > These tests have always silently permitted a -1 return value from > OperatingSystemMXBean CPU time methods. > > They need to be stricter, but occasionally Windows 2019 returns a -1 for the > first few calls of these methods. This seems to be a Windows 2019 bug or > peculiarity. Other Windows versions are not affected. > > GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the CPU > time calls continually return -1. They should permit -1 values, as long as > subsequently a value in the valid range is read. > > The GetProcessCpuTime test also needs to retry enough times to expect no -1 > values, and not just skip. While updating this test: it has a maximum > expected value of Long.MAX_VALUE, which it may as well reduce to something > that does not look like a binary "all ones except for the high bit" value > (without creating an ongoing game where we keep increasing the value to avoid > failures in slow runs). test/jdk/com/sun/management/OperatingSystemMXBean/GetProcessCpuLoad.java line 43: > 41: double load; > 42: > 43: for(int i = 0; i < 10; i++) { Suggestion: for (int i = 0; i < 10; i++) { test/jdk/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java line 43: > 41: double load; > 42: > 43: for(int i = 0; i < 10; i++) { Suggestion: for (int i = 0; i < 10; i++) { test/jdk/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java line 49: > 47: // Remember a -1 in case it never gets better. > 48: ex = new RuntimeException("getSystemCpuLoad() returns " + > load > 49: + " which is not in the [0.0,1.0] interval"); Suggestion: + " which is not in the [0.0,1.0] interval"); test/jdk/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java line 53: > 51: } else if (load < 0.0 || load > 1.0) { > 52: throw new RuntimeException("getSystemCpuLoad() returns " > + load > 53:+ " which is not in the [0.0,1.0] interval"); Suggestion: + " which is not in the [0.0,1.0] interval"); - PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010344429 PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010344971 PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010345618 PR Review Comment: https://git.openjdk.org/jdk/pull/24186#discussion_r2010346257
Re: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
On Mon, 17 Mar 2025 16:20:42 GMT, Robert Toyonaga wrote: > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` > the OS memory operations are currently done before acquiring the the NMT > mutex. And the the virtual memory accounting is done later in `MemTracker`, > after the lock has been acquired. Doing the memory operations outside of the > lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating > after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid > subsequent calls (release range_A, followed by map range_A). But NMT sees > "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory > happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both > the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks > `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks > `MemTracker::enabled()` and checks against nullptr. This refactoring > previously wasn't possible because `ThreadCritical` was used before > https://github.com/openjdk/jdk/pull/22745 introduced > `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform > specific code: Ex. lock around { munmap() + MemTracker::record }. The hope > was that this would help reduce the size of the critical section. However, I > found that the OS-specific "pd_" functions are already short and > to-the-point, so doing this wasn't reducing the lock scope very much. Instead > it just makes the code more messy by having to maintain the locking and NMT > accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... Ping @JoKern65 for AIX - PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748477402
RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
### Summary: This PR makes memory operations atomic with NMT accounting. ### The problem: In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. 1.1 Thread_1 releases range_A. 1.2 Thread_1 tells NMT "range_A has been released". 2.1 Thread_2 reserves (the now free) range_A. 2.2 Thread_2 tells NMT "range_A is reserved". Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. ### Solution: Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. ### Other notes: I also simplified this pattern found in many places: if (MemTracker::enabled()) { MemTracker::NmtVirtualMemoryLocker nvml; result = pd_some_operation(addr, bytes); if (result != nullptr) { MemTracker::record_some_operation(addr, bytes); } } else { result = pd_unmap_memory(addr, bytes); } ``` To: MemTracker::NmtVirtualMemoryLocker nvml; result = pd_unmap_memory(addr, bytes); MemTracker::record_some_operation(addr, bytes); ``` This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. In many places I've done minor refactoring by relocating calls to `MemTracker` in order to tighten the locking scope. In some OS specific code (such as `os::map_memory_to_file`), I've replaced calls to `os::release_memory` with `os::pd_release_memory`. This is to avoid `NmtVirtualMemoryLocker` reentrancy. In a few places (such as `VirtualMemoryTracker::add_reserved_region`) I have replaced `tty` with `defaultStream::output_stream()`. Otherwise `NmtVirtualMemory_lock` would be acquired out of rank order with `tty_lock`. ### Testing: One concern, due to the expanded critical section, is reentrancy. `NmtVirtualMemoryLocker` is a HotSpot mutex and is not reentrant. I've added new tests in _test_os.cpp_ and _test_virtualMemoryTracker.cpp_ that try to exercise any usages of NMT that weren't already exercised by existing tests. tier1 passes on linux and windows. I do not have an AIX machine to test on. Can someone please help run the tests on AIX? - Commit messages: - make memory op and NMT accounting atomic Changes: https://git.openjdk.org/jdk/pull/24084/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341491 Stats: 291 lines in 12 files changed: 186 ins; 42 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/24084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084 PR: https://git.openjdk.org/jdk/pull/24084
Re: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock
On Mon, 24 Mar 2025 16:16:34 GMT, Stefan Karlsson wrote: >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and >> `os::reserve_memory` the OS memory operations are currently done before >> acquiring the the NMT mutex. And the the virtual memory accounting is done >> later in `MemTracker`, after the lock has been acquired. Doing the memory >> operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating >> after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid >> subsequent calls (release range_A, followed by map range_A). But NMT sees >> "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory >> happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both >> the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks >> `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks >> `MemTracker::enabled()` and checks against nullptr. This refactoring >> previously wasn't possible because `ThreadCritical` was used before >> https://github.com/openjdk/jdk/pull/22745 introduced >> `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform >> specific code: Ex. lock around { munmap() + MemTracker::record }. The hope >> was that this would help reduce the size of the critical section. However, I >> found that the OS-specific "pd_" functions are already short and >> to-the-point, so doing this wasn't reducing the lock scope very much. >> Instead it just makes the code more messy by having to maintain the locking >> and NMT accounting in each platform specific i... > > I don't see why we need to extend the lock to be held over the > reserve/commit/alloc part. > > If we only extend the lock on the release side, then it looks like we get the > required exclusion: > > lock > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > unlock > > 2.1 Thread_2 reserves (the now free) range_A. > lock > 2.2 Thread_2 tells NMT "range_A is reserved". > unlock > > We can get ordering (1.1) (2.1) (1.2) (2.2) but we can't get (1.1) (2.1) > (2.2) (1.2). > > And isn't this locking scheme exactly what the current code is using? Have > you seen an issue that this proposed PR intends to solve? If there is such a > problem I wonder if there's just a missing lock extension in one of the > "release" operations. @stefank > And isn't this locking scheme exactly what the current code is using? Have > you seen an issue that this proposed PR intends to solve? If there is such a > problem I wonder if there's just a missing lock extension in one of the > "release" operations. What about the case where one thread reserves a range and another thread releases it? 1 Thread A reserves range 2 Thread B releases range 3 Thread B tells NMT "range released" 4 Thread A tells NMT "range reserved" This would either result in an assert in NMT at step 3 when releasing a range NMT does not know. Or in an incorrectly booked range in step 4 without asserts. Am I making a thinking error somewhere? - PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2748815516
Re: RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently [v2]
> These tests have always silently permitted a -1 return value from > OperatingSystemMXBean CPU time methods. > > They need to be stricter, but occasionally Windows 2019 returns a -1 for the > first few calls of these methods. This seems to be a Windows 2019 bug or > peculiarity. Other Windows versions are not affected. > > GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the CPU > time calls continually return -1. They should permit -1 values, as long as > subsequently a value in the valid range is read. > > The GetProcessCpuTime test also needs to retry enough times to expect no -1 > values, and not just skip. While updating this test: it has a maximum > expected value of Long.MAX_VALUE, which it may as well reduce to something > that does not look like a binary "all ones except for the high bit" value > (without creating an ongoing game where we keep increasing the value to avoid > failures in slow runs). Kevin Walls has updated the pull request incrementally with two additional commits since the last revision: - whitespace - whitespace - Changes: - all: https://git.openjdk.org/jdk/pull/24186/files - new: https://git.openjdk.org/jdk/pull/24186/files/21ebab5b..2cfd63ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24186&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24186&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24186/head:pull/24186 PR: https://git.openjdk.org/jdk/pull/24186
RFR: 8352392: AIX: implement attach API v2 and streaming output
AIX changes for attach API to support arbitrary length arguments and the streaming output support. serviceability/attach/AttachAPIv2/StreamingOutputTest.java test passes tier1, tier2 and tier3 testing is successful with fastdebug level JBS Issue : [JDK-8352392](https://bugs.openjdk.org/browse/JDK-8352392) - Commit messages: - AIX: implement attach API v2 and streaming output - AIX: implement attach API v2 and streaming output - AIX: implement attach API v2 and streaming output - AIX: Problem list serviceability/attach/AttachAPIv2/StreamingOutputTest.java Changes: https://git.openjdk.org/jdk/pull/24177/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24177&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352392 Stats: 279 lines in 2 files changed: 63 ins; 201 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24177.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24177/head:pull/24177 PR: https://git.openjdk.org/jdk/pull/24177
RFR: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently
These tests have always silently permitted a -1 return value from OperatingSystemMXBean CPU time methods. They need to be stricter, but occasionally Windows 2019 returns a -1 for the first few calls of these methods. This seems to be a Windows 2019 bug or peculiarity. Other Windows versions are not affected. GetProcessCpuLoad.java and GetSystemCpuLoad.java need to fail only if the CPU time calls continually return -1. They should permit -1 values, as long as subsequently a value in the valid range is read. The GetProcessCpuTime test also needs to retry enough times to expect no -1 values, and not just skip. While updating this test: it has a maximum expected value of Long.MAX_VALUE, which it may as well reduce to something that does not look like a binary "all ones except for the high bit" value (without creating an ongoing game where we keep increasing the value to avoid failures in slow runs). - Commit messages: - whitespace - problemlist - Merge remote-tracking branch 'upstream/master' into 8351002_OSMXBean_CPU_Tests - 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently Changes: https://git.openjdk.org/jdk/pull/24186/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24186&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351002 Stats: 59 lines in 4 files changed: 44 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24186/head:pull/24186 PR: https://git.openjdk.org/jdk/pull/24186