On Thu, 23 Nov 2023 00:48:19 GMT, David Holmes <dhol...@openjdk.org> wrote:
>> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump >> asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The >> reasoning was that you should never have an owned ObjectMonitor with a dead >> object. I added an assert to check this assumption. It turns out that the >> assumption was wrong *if* you use JNI to call MonitorEnter and then remove >> all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes >> more correct when this filter was removed. Previously, the locked monitors >> never got unlocked because the ObjectMonitor iterator never exposed these >> monitors to the JNI detach code that unlocks the thread's monitors. That bug >> caused an ObjectMonitor leak. So, for this case I'm leaving these >> ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so >> I'm filtering those in the closure that collects ObjectMonitor. Side note: >> We have discussions about ways to completely rewrite this by letting each >> thread have thread-local information about JNI held locks. If we have this >> we could probably throw away the entire ObjectMonitorDump hashtable, and its >> walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird >> ObjectMonitor. If we do, then the users can detect that a thread holds a >> lock with a dead object, and the code will return NULL as one of the "owned >> monitors" returned. I don't think that's a good idea, so I'm filtering out >> these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > >> Previously, the locked monitors never got unlocked because the >> ObjectMonitor iterator never exposed these monitors to the JNI detach code > > I had not realized that. It explains some confusion in a separate issue I > had been looking into! It is important that these monitors are exposed and > unlocked at detach time, otherwise it also messes up the `held_monitor_count`. > >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so >> I'm filtering those in the closure > > I think we may need to make that code tolerate the absence of an object. > >> For GetOwnedMonitorInfo it is unclear if we should expose these weird >> ObjectMonitor. > > I think we probably should expose this to be accurate, but I think this needs > investigation on the JVMTI side to ensure that the null entry is tolerated > okay. So a separate RFE to handle this would be fine. > > Thanks While addressing @dholmes-ora's comments about the tests I found that the JNI implementation was incorrect and caused a failure, that seems to prevent the thread detach, and then the Java layer thread dumping caught the thread dumping bug. I'm going to see if I can split the various test cases so that they can be tested individually. Don't look too closely at the current test until it has been updated. > test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line > 94: > >> 92: >> 93: // Let the GC clear the weak reference to the object. >> 94: system_gc(env); > > AFAIK there is no guarantee that one call to `System.gc()` will suffice to > clear the weakRef. We tend use a loop with a few iterations in other tests, > or use a WhiteBox method to achieve it. In my testing I used the finalizer to > observe that the objects had been finalized but even then, and with a loop, I > did not always see them collected with G1. ZGC will clear the weak reference. G1 clears the weak reference, except for in a few specific situations. I've verified that G1 clears this weak reference, so I don't think it would be worth making this test longer or more complicated. > test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line > 104: > >> 102: // source of at least two bugs: >> 103: // - When the object reference in the monitor was made weak, the code >> 104: // didn't unlock the monitor, leaving it lingering in the system. > > Suggestion: > > // - When the object reference in the monitor was cleared, the monitor > // iterator code would skip it, preventing it from being unlocked when > // the owner thread detached, leaving it lingering in the system. > > the original made it sound to me like the code that cleared the reference > (i.e. the GC) was expected to do the unlocking. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824116115 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403168662 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403169323