On Mon, 28 Oct 2024 16:12:39 GMT, Robert Toyonaga <d...@openjdk.org> wrote:
>> ### Summary >> This PR just replaces `ThreadCritical` with a lock specific to NMT. >> `ThreadCritical` is a big lock and is unnecessary for the purposes of NMT. >> I've implemented the new lock with a semaphore so that it can be used early >> before VM init. There is also the possibility of adding assertions in >> places we expect NMT to have synchronization. I haven't added assertions yet >> in many places because some code paths such as the (NMT tests) don't lock >> yet. I think it makes sense to close any gaps in locking in another PR in >> which I can also add more assertions. >> >> Testing: >> - hotspot_nmt >> - gtest:VirtualSpace >> - tier1 > > Robert Toyonaga has updated the pull request incrementally with two > additional commits since the last revision: > > - add a comment explaining lock rank > - remove unnecessary dropping of tracking level I had to analyze this again, to understand why we need this locking, since my mind slips. I updated my findings in https://bugs.openjdk.org/browse/JDK-8325890 . Please see details there. But I don't think the current locking makes much sense, tbh (even before this patch). Or I don't get it. We have three counters in play here: A) the `malloc[mtChunk]` malloc counter for the mtChunk tag B) the global malloc counter C) the `arena[xxx]` arena counter for the mtXXX tag the arena is tagged with Upon arena growth, we incremement all four. In two steps - first (A) and (B) under os::malloc, then (C) in the arena code. Upon arena death, we may delete the chunk, in which case we adjust all three counters as well. But since we don't want (B) to include arena sizes, we have several places where we lazily adjust (B) and (A) from the sum of all arena counters (see JBS issue). These adjustments must be synchronized with arena allocations. But we don't do this. The lock does not include (C). For that, the lock would have to be at a different place, inside arena code. We only lock around (A) and (B). I am not sure what the point of this whole thing is. I am also not convinced that it always works correctly. This whole "updating counters lazily" business makes the code brittle. Also, the current locking does not guarantee correctness anyway. There are several places in the coding where we calculate e.g. the sum of all mallocs, but malloc counters are modified (rightly so) out of lock protection. To make matters even more complex, we have not two locks here but three: - ThreadCritical - the new `NmtVirtualMemory_lock` - we also have the `NMTQuery_lock`, which guards NMT reports from jcmd exclusively Not sure what the point of the latter even is. It certainly does not prevent us from getting reports while underlying data structures are being changed, unless I am overlooking something. It would be very nice if someone other than me could analyze this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2449568622