Re: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object

Stefan Karlsson Thu, 23 Nov 2023 02:15:11 -0800

On Thu, 23 Nov 2023 00:48:19 GMT, David Holmes <dhol...@openjdk.org> wrote:


>> In the rewrites made for:
>> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump 
>> asserts in interleaved ObjectMonitor::deflate_monitor calls`
>> 
>> I removed the filtering of *owned ObjectMonitors with dead objects*. The 
>> reasoning was that you should never have an owned ObjectMonitor with a dead 
>> object. I added an assert to check this assumption. It turns out that the 
>> assumption was wrong *if* you use JNI to call MonitorEnter and then remove 
>> all references to the locked object.
>> 
>> The provided tests provoke this assert form:
>> * the JNI thread detach code
>> * thread dumping with locked monitors, and
>> * the JVMTI GetOwnedMonitorInfo API.
>> 
>> While investigating this we've found that the thread detach code becomes 
>> more correct when this filter was removed. Previously, the locked monitors 
>> never got unlocked because the ObjectMonitor iterator never exposed these 
>> monitors to the JNI detach code that unlocks the thread's monitors. That bug 
>> caused an ObjectMonitor leak. So, for this case I'm leaving these 
>> ObjectMonitors unfiltered so that we don't reintroduce the leak.
>> 
>> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so 
>> I'm filtering those in the closure that collects ObjectMonitor. Side note: 
>> We have discussions about ways to completely rewrite this by letting each 
>> thread have thread-local information about JNI held locks. If we have this 
>> we could probably throw away the entire ObjectMonitorDump hashtable, and its 
>> walk of the `_in_use_list.`.
>> 
>> For GetOwnedMonitorInfo it is unclear if we should expose these weird 
>> ObjectMonitor. If we do, then the users can detect that a thread holds a 
>> lock with a dead object, and the code will return NULL as one of the "owned 
>> monitors" returned. I don't think that's a good idea, so I'm filtering out 
>> these ObjectMonitor for those calls.
>> 
>> Test: the written tests with and without the fix. Tier1-Tier3, so far.
>
>>  Previously, the locked monitors never got unlocked because the 
>> ObjectMonitor iterator never exposed these monitors to the JNI detach code
> 
> I had not realized that. It explains some confusion in  a separate issue I 
> had been looking into! It is important that these monitors are exposed and 
> unlocked at detach time, otherwise it also messes up the `held_monitor_count`.
> 
>> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so 
>> I'm filtering those in the closure
> 
> I think we may need to make that code tolerate the absence of an object.
> 
>> For GetOwnedMonitorInfo it is unclear if we should expose these weird 
>> ObjectMonitor.
> 
> I think we probably should expose this to be accurate, but I think this needs 
> investigation on the JVMTI side to ensure that the null entry is tolerated 
> okay. So a separate RFE to handle this would be fine.
> 
> Thanks

While addressing @dholmes-ora's comments about the tests I found that the JNI 
implementation was incorrect and caused a failure, that seems to prevent the 
thread detach, and then the Java layer thread dumping caught the thread dumping 
bug. I'm going to see if I can split the various test cases so that they can be 
tested individually. Don't look too closely at the current test until it has 
been updated.

> test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
> 94:
> 
>> 92: 
>> 93:   // Let the GC clear the weak reference to the object.
>> 94:   system_gc(env);
> 
> AFAIK there is no guarantee that one call to `System.gc()` will suffice to 
> clear the weakRef. We tend use a loop with a few iterations in other tests, 
> or use a WhiteBox method to achieve it. In my testing I used the finalizer to 
> observe that the objects had been finalized but even then, and with a loop, I 
> did not always see them collected with G1.

ZGC will clear the weak reference. G1 clears the weak reference, except for in 
a few specific situations. I've verified that G1 clears this weak reference, so 
I don't think it would be worth making this test longer or more complicated.

> test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 
> 104:
> 
>> 102:   // source of at least two bugs:
>> 103:   // - When the object reference in the monitor was made weak, the code
>> 104:   //   didn't unlock the monitor, leaving it lingering in the system.
> 
> Suggestion:
> 
> // - When the object reference in the monitor was cleared, the monitor
> //   iterator code would skip it, preventing it from being unlocked when
> //   the owner thread detached, leaving it lingering in the system.
> 
> the original made it sound to me like the code that cleared the reference 
> (i.e. the GC) was expected to do the unlocking.

Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824116115
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403168662
PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403169323

Re: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object

Reply via email to