On Mon, 28 Oct 2024 16:12:39 GMT, Robert Toyonaga <d...@openjdk.org> wrote:

>> ### Summary
>> This PR just replaces `ThreadCritical` with a lock specific to NMT.  
>> `ThreadCritical` is a big lock and is unnecessary for the purposes of NMT. 
>> I've implemented the new lock with a semaphore so that it can be used early 
>> before VM init.  There is also the possibility of adding assertions in 
>> places we expect NMT to have synchronization. I haven't added assertions yet 
>> in many places because some code paths such as the (NMT tests)  don't lock 
>> yet. I think it makes sense to close any gaps in locking in another PR in 
>> which I can also add more assertions. 
>> 
>> Testing:
>> - hotspot_nmt
>> - gtest:VirtualSpace
>> - tier1
>
> Robert Toyonaga has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - add a comment explaining lock rank
>  - remove unnecessary dropping of tracking level

I had to analyze this again, to understand why we need this locking, since my 
mind slips.

I updated my findings in https://bugs.openjdk.org/browse/JDK-8325890 . Please 
see details there.

But I don't think the current locking makes much sense, tbh (even before this 
patch). Or I don't get it.

We have three counters in play here:
A) the `malloc[mtChunk]` malloc counter for the mtChunk tag
B) the global malloc counter
C) the `arena[xxx]` arena counter for the mtXXX tag the arena is tagged with

Upon arena growth, we incremement all four. In two steps - first (A) and (B) 
under os::malloc, then (C) in the arena code. 

Upon arena death, we may delete the chunk, in which case we adjust all three 
counters as well.

But since we don't want (B) to include arena sizes, we have several places 
where we lazily adjust (B) and (A) from the sum of all arena counters (see JBS 
issue). These adjustments must be synchronized with arena allocations. But we 
don't do this. The lock does not include (C). For that, the lock would have to 
be at a different place, inside arena code. We only lock around (A) and (B).

I am not sure what the point of this whole thing is. I am also not convinced 
that it always works correctly. This whole "updating counters lazily" business 
makes the code brittle.

Also, the current locking does not guarantee correctness anyway. There are 
several places in the coding where we calculate e.g. the sum of all mallocs, 
but malloc counters are modified (rightly so) out of lock protection.

To make matters even more complex, we have not two locks here but three:
- ThreadCritical
- the new `NmtVirtualMemory_lock`
- we also have the `NMTQuery_lock`, which guards NMT reports from jcmd 
exclusively

Not sure what the point of the latter even is. It certainly does not prevent us 
from getting reports while underlying data structures are being changed, unless 
I am overlooking something.

It would be very nice if someone other than me could analyze this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20852#issuecomment-2449568622

Reply via email to