JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

Roman Kennke Mon, 25 Sep 2023 13:02:50 -0700

On Mon, 25 Sep 2023 19:18:38 GMT, Chris Plummer <cjplum...@openjdk.org> wrote:


>> Yes. But unless we run this at a safepoint, there is no consistent state 
>> when we race with locking. I don't think we want to spam users with warnings 
>> whenever we run into anonymous owners (which is not too infrequent). 
>> Existing (pre-lightweight-) locking doesn't print warnings either, when a 
>> thread that's expected to lock on an object has moved on or otherwise 
>> encounters racy state.
>
> I need a better understanding of anonymous owners and the race you think is 
> going on here. 99+% of the time if we are not at a safepoint, pretty much 
> everything still works fine with SA. Safe points guarantee safe access, but 
> not being at a safepoint does not guarantee there will be an issue for SA. It 
> usually still works, and we try to report when it doesn't.

With the new lightweight locking, when a thread T1 holds a lightweight lock on 
object O, and another thread T2 comes in and also wants to acquire that lock, 
it inflates the lock to a monitor. And since it cannot determine which threads 
currently locks O, it installs a special owner - which we call anonymous owner 
- in the ObjectMonitor, it basically means 'we don't know'. As soon as T1 
unlocks O, it observes that the lock is now inflated and set to anonymous 
owner, installs itself as the proper owner of the ObjectMonitor and proceeds to 
unlock the ObjectMonitor - thereby handing over ownership to T2.

The specific race here is that SA sees an anonymously locked ObjectMonitor and 
tries to find the owning thread, and fails, presumably because that thread has 
moved on and unlocked the object in the meantime.

However, the race is more general, and pre-(lightweight-locking-)existing: if 
SA observes a locked object, or a thread which holds a particular lock, by the 
time it is reporting, this information may (likely) be wrong because the thread 
has, in the meantime, moved on. This in itself may not be a big problem. What 
may be worse is that, if SA prints stacks of all threads and also prints 
locking information, the overall information may be *inconsistent*. For 
example, it may report that two threads hold the same object.
Also, I am not sure how deep SA goes. Telling from the stack-trace in the 
original report, it looks like there is a DeadlockDetector involved. If threads 
don't hold still while we try to detect deadlocks, we may report false 
positives. (We would perhaps not report false negatives, because deadlocked 
thread by definition don't move ;-) )

I don't know enough about SA and how it deals with locking to judge how much of 
a problem all that is, but I think the pre-existing code (before LW locking) 
did not warn about locking raciness, and I don't really see that we should do 
this for the anonymous case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336332299

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

Reply via email to