On Wed, 30 Oct 2024 12:32:17 GMT, Johan Sjölen <jsjo...@openjdk.org> wrote:
>>> Hi @tstuefe, >>> >>> > Ah thank you. I think I added that printing ages ago. I should not have >>> > used tty. The immediate solution to get you going may be to just change >>> > this from tty to fileStream fs(stderr); >>> >>> Thanks! I adopted your suggestion and replaced tty. The lock conflict is >>> gone now. >>> >>> > Ensuring that we feed NMT the tracking information in the order they >>> > happened. This is only relevant for mmap accounting, since only there do >>> > we need to track address ranges. For malloc accounting, we just >>> > increase/decrease counters, and there, the order of operations does not >>> > matter, just that we do it atomically. >>> > ... >>> > A pragmatic solution would be to move the locking down the stack. >>> > Currently, we lock in the generic layer around { pd_release_memory() + >>> > MemTracker::record }. But we only need to lock around the OS allocation >>> > function (mmap/munmap/VirtualAlloc etc). So we could move the locks >>> > downward to the OS specific layer where the actual allocations happen in >>> > the OS code, e.g. just lock around { munmap() + MemTracker::record }. >>> >>> Ok I see. So the locking should indeed not happen inside of `MemTracker`. >>> Yes, maybe moving the locking down the stack into platform specific code >>> would be a good task for the future. However, if the intention is to feed >>> NMT the tracking info in the order it happened, why are the operations on >>> memory in `os::commit_memory` and `os::reserve_memory` not protected by the >>> lock? In those, we delay and lock inside `MemTracker` instead. >> >> That looks like an error that should be fixed. > > Hi @tstuefe, @roberttoyonaga > > We saw some test failures in non-generational ZGC due to this change. That GC > is removed now, so there's no worry there. I did however look at all of our > remaining usages of `ThreadCritical` and saw that there are thrre left that > are related to NMT: > > > src/hotspot/share/nmt/nmtUsage.cpp > 56 ThreadCritical tc; > > src/hotspot/share/nmt/mallocTracker.cpp > 65 // Use ThreadCritical to make sure that mtChunks don't get deallocated > while the > 68 ThreadCritical tc; > > > src/hotspot/share/memory/arena.cpp > 178 ThreadCritical tc; // Free chunks under TC lock so that NMT > adjustment is stable. > > > I am not that familiar with this code. Is leaving these as they are > intentional, or have we introduced a potential bug? > > Thank you. Hi @jdksjolen, I originally replaced those instances, but then [put them back in](https://github.com/openjdk/jdk/pull/20852/commits/88e075d1023bbfa240c5606c9c64ea720a3e83d8) because I saw that they are related to chunks allocation/deallocation code where it seemed `ThreadCritical` was used for other reasons. I'm not certain, but looking at it again, it seems that the `ThreadCritical` uses in `ChunkPool::deallocate_chunk` and `ChunkPool::prune()` are only needed for NMT and are independent of the other `ThreadCritical` uses in that code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2447196115