On Wed, 30 Oct 2024 12:32:17 GMT, Johan Sjölen <jsjo...@openjdk.org> wrote:

>>> Hi @tstuefe,
>>> 
>>> > Ah thank you. I think I added that printing ages ago. I should not have 
>>> > used tty. The immediate solution to get you going may be to just change 
>>> > this from tty to fileStream fs(stderr);
>>> 
>>> Thanks! I adopted your suggestion and replaced tty. The lock conflict is 
>>> gone now.
>>> 
>>> > Ensuring that we feed NMT the tracking information in the order they 
>>> > happened. This is only relevant for mmap accounting, since only there do 
>>> > we need to track address ranges. For malloc accounting, we just 
>>> > increase/decrease counters, and there, the order of operations does not 
>>> > matter, just that we do it atomically.
>>> > ...
>>> > A pragmatic solution would be to move the locking down the stack. 
>>> > Currently, we lock in the generic layer around { pd_release_memory() + 
>>> > MemTracker::record }. But we only need to lock around the OS allocation 
>>> > function (mmap/munmap/VirtualAlloc etc). So we could move the locks 
>>> > downward to the OS specific layer where the actual allocations happen in 
>>> > the OS code, e.g. just lock around { munmap() + MemTracker::record }.
>>> 
>>> Ok I see. So the locking should indeed not happen inside of `MemTracker`. 
>>> Yes, maybe moving the locking down the stack into platform specific code 
>>> would be a good task for the future. However, if the intention is to feed 
>>> NMT the tracking info in the order it happened, why are the operations on 
>>> memory in `os::commit_memory` and `os::reserve_memory` not protected by the 
>>> lock? In those, we delay and lock inside `MemTracker` instead.
>> 
>> That looks like an error that should be fixed.
>
> Hi @tstuefe, @roberttoyonaga 
> 
> We saw some test failures in non-generational ZGC due to this change. That GC 
> is removed now, so there's no worry there. I did however look at all of our 
> remaining usages of `ThreadCritical` and saw that there are thrre left that 
> are related to NMT:
> 
> 
> src/hotspot/share/nmt/nmtUsage.cpp
> 56     ThreadCritical tc;
> 
> src/hotspot/share/nmt/mallocTracker.cpp
> 65     // Use ThreadCritical to make sure that mtChunks don't get deallocated 
> while the
> 68     ThreadCritical tc;
> 
> 
> src/hotspot/share/memory/arena.cpp
> 178      ThreadCritical tc;  // Free chunks under TC lock so that NMT 
> adjustment is stable.
> 
> 
> I am not that familiar with this code. Is leaving these as they are 
> intentional, or have we introduced a potential bug?
> 
> Thank you.

Hi @jdksjolen, I originally replaced those instances, but then [put them back 
in](https://github.com/openjdk/jdk/pull/20852/commits/88e075d1023bbfa240c5606c9c64ea720a3e83d8)
 because I saw that they are related to chunks allocation/deallocation code 
where it seemed `ThreadCritical` was used for other reasons.  

I'm not certain, but looking at it again, it seems that the `ThreadCritical` 
uses in `ChunkPool::deallocate_chunk` and `ChunkPool::prune()` are only needed 
for NMT and are independent of the other `ThreadCritical` uses in that code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2447196115

Reply via email to