On Wed, 22 Nov 2023 15:00:29 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:
> In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump > asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The > reasoning was that you should never have an owned ObjectMonitor with a dead > object. I added an assert to check this assumption. It turns out that the > assumption was wrong *if* you use JNI to call MonitorEnter and then remove > all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more > correct when this filter was removed. Previously, the locked monitors never > got unlocked because the ObjectMonitor iterator never exposed these monitors > to the JNI detach code that unlocks the thread's monitors. That bug caused an > ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors > unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so > I'm filtering those in the closure that collects ObjectMonitor. Side note: We > have discussions about ways to completely rewrite this by letting each thread > have thread-local information about JNI held locks. If we have this we could > probably throw away the entire ObjectMonitorDump hashtable, and its walk of > the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird > ObjectMonitor. If we do, then the users can detect that a thread holds a lock > with a dead object, and the code will return NULL as one of the "owned > monitors" returned. I don't think that's a good idea, so I'm filtering out > these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Functional fix is simple and fine but quite a lot of commentary on the tests. Thanks src/hotspot/share/runtime/vmOperations.cpp line 354: > 352: // alive. Filter out monitors with dead objects. > 353: return; > 354: } I don't think we need to do this, but even without this filtering I ran a number of tests and was unable to demonstrate any problem. The JNI locked monitor seems to be "invisible" to the frame that locked it and so the thread dump never encounters it. Were you able to provoke a failure here or is this defensive programming? test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 28: > 26: * @test IterateMonitorWithDeadObjectTest > 27: * @summary This locks a monitor, GCs the object, and iterate and perform > 28: * various iteration and operations over this monitor. This doesn't read right with "iterate" and "iteration". Not sure exactly what you were trying to say. test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 29: > 27: * @summary This locks a monitor, GCs the object, and iterate and perform > 28: * various iteration and operations over this monitor. > 29: * @requires os.family == "linux" I know the test this was copied from had this but I'm not sure it is actually a necessary restriction - any Posix platform should work. Though maybe perror is linux only ... test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 31: > 29: * @requires os.family == "linux" > 30: * @library /testlibrary /test/lib > 31: * @build IterateMonitorWithDeadObjectTest You don't need an explicit `@build` step test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 40: > 38: public class IterateMonitorWithDeadObjectTest { > 39: public static native void runTestAndDetachThread(); > 40: public static native void joinTestThread(); I don't think this form of the test needs to separate out the `pthread_join()`, it can just be done in `runTestAndDetachThread` AFAICS. I originally split it out to allow the Java code to do the GC while the native thread was sleeping prior to detaching. test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 57: > 55: // - Drop the last reference to the object > 56: // - GC to clear the weak reference to the object in the monitor > 57: // - Detach the thread - provoke previous bug It also does a thread dump while the lock is held test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 66: > 64: // dead object. The thread dumping code didn't tolerate such a > monitor, > 65: // so run a thread dump and make sure that it doesn't > crash/assert. > 66: dumpThreadsWithLockedMonitors(); But you've already detached the thread so there is no locked monitor any longer. And `runTestAndDetachThread()` also did a thread dump. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 43: > 41: static jobject create_object(JNIEnv* env) { > 42: jclass clazz = (*env)->FindClass(env, "java/lang/Object"); > 43: if (clazz == 0) die("No class"); The `die` method is for errors with system calls. It won't show useful information for JNI calls that leave exceptions pending. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 76: > 74: if (dumpAllThreadsMethod == 0) die("No dumpAllThreads method"); > 75: > 76: // The 'lockedMonitors == true' is what triggers the collection of the > monitor with the dead object. "triggers the collection" sounds like a GC interaction but that is not what you mean. Suggestion: // The 'lockedMonitors == true' is what causes the monitor with a dead object to be examined. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 94: > 92: > 93: // Let the GC clear the weak reference to the object. > 94: system_gc(env); AFAIK there is no guarantee that one call to `System.gc()` will suffice to clear the weakRef. We tend use a loop with a few iterations in other tests, or use a WhiteBox method to achieve it. In my testing I used the finalizer to observe that the objects had been finalized but even then, and with a loop, I did not always see them collected with G1. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 104: > 102: // source of at least two bugs: > 103: // - When the object reference in the monitor was made weak, the code > 104: // didn't unlock the monitor, leaving it lingering in the system. Suggestion: // - When the object reference in the monitor was cleared, the monitor // iterator code would skip it, preventing it from being unlocked when // the owner thread detached, leaving it lingering in the system. the original made it sound to me like the code that cleared the reference (i.e. the GC) was expected to do the unlocking. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 107: > 105: // - When the monitor iterator API was rewritten the code was changed > to > 106: // assert that we didn't have "owned" monitors with dead objects. > This > 107: // test provokes that situation and those asserts. nit: s/those asserts/that assert/ test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 54: > 52: > 53: private static void jniMonitorEnterAndLetObjectDie() { > 54: // The monitor iterator used GetOwnedMonitorInfo used to s/iterator used/iterator used by/ test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 59: > 57: // GetOwnedMonitorInfo testing. > 58: Object obj = new Object() { public String toString() {return "";} > }; > 59: jniMonitorEnter(obj); I would add a check for `Thread.holdsLock(obj);` after this just to be sure it worked. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 61: > 59: jniMonitorEnter(obj); > 60: obj = null; > 61: System.gc(); Again one gc() is generally not sufficient. How can this test tell that the object in the monitor was actually cleared? I think `monitorinflation` logging may be the only way to tell. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1745590548 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843318 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843678 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843923 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402844118 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402846787 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402846942 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402845852 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402857972 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402859246 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402847898 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402848741 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402848946 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402851705 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402857110 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402852983