On Wed, 1 May 2024 21:32:46 GMT, Chris Plummer <cjplum...@openjdk.org> wrote:
> This PR adds ranked monitor support to the debug agent. The debug agent has a > large number of monitors, and it's really hard to know which order to grab > them in, and for that matter which monitors might already be held at any > given moment. By imposing a rank on each monitor, we can check to make sure > they are always grabbed in the order of their rank. Having this in place when > I was working on [JDK-8324868](https://bugs.openjdk.org/browse/JDK-8324868) > would have made it much easier to detect a deadlock that was occuring, and > the reason for it. That's what motivated me to do this work > > There were 2 or 3 minor rank issues discovered as a result of these changes. > I also learned a lot about some of the more ugly details of the locking > support in the process. > > Tested with the following on all supported platforms and with virtual threads: > > com/sun/jdi > vmTestbase/nsk/jdi > vmTestbase/nsk/jdb > vmTestbase/nsk/jdwp > > Still need to run tier2 and tier5. > > Details of the changes follow in the first comment. src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 656: > 654: commonRef_lock(); > 655: if (gdata->rankedMonitors) { > 656: dbgRawMonitor_lock(); What is this call for? I think dbgRawMonitor_lock/dbgRawMonitor_unlock should not be used outside of util.c (I'd remove them from util.h) src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1204: > 1202: // Need to lock during initialization so verifyMonitorRank() can be > guaranteed that > 1203: // if the monitor field is set, then the monitor if fully > initialized. More specifically, > 1204: // that the rank field has been set. rank for all monitors can be set in `util_initialize()`: for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) { dbg_monitors[i]->rank = i; } src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1231: > 1229: } > 1230: > 1231: dbg_monitor->monitor = NULL; I think it would be better to protect this with dbgRawMonitor src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1243: > 1241: error = JVMTI_FUNC_PTR(gdata->jvmti,GetThreadInfo) > 1242: (gdata->jvmti, thread, &info); > 1243: return info.name; Need to delete JNI reference info.thread_group and info.jthreadGroup (if not NULL) src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1262: > 1260: for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) { > 1261: DebugRawMonitor* dbg_monitor = &dbg_monitors[i]; > 1262: if (dbg_monitor == NULL) { Should be `dbg_monitor->monitor` src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1279: > 1277: return; // This assert is not reliable if the VM is exiting > 1278: } > 1279: jthread current_thread = threadControl_currentThread(); Looks like all callers of the `assertIsCurrentThread()` have reference to the current thread. Maybe pass it as argument? src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1281: > 1279: jthread current_thread = threadControl_currentThread(); > 1280: if (!isSameObject(env, thread, current_thread)) { > 1281: tty_message("ERROR: Threads not the same: %p %p\n", thread, > current_thread); Not sure the addresses provide useful information. Maybe print thread names? src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1289: > 1287: > 1288: static void > 1289: verifyMonitorRank(JNIEnv *env, DebugRawMonitorRank rank, jthread thread) please add a comment that the function should be called under dbgRawMonitor lock (also for other functions which assume this) src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1294: > 1292: // has a higher rank than the monitor we are about to enter. > 1293: DebugRawMonitorRank i; > 1294: for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) { No need to check monitors from the beginning. Should be enough to check starting from `min(rank, FIRST_LEAF_DEBUG_RAW_MONITOR)` src/jdk.jdwp.agent/share/native/libjdwp/util.c line 1470: > 1468: > 1469: // Assert that the current thread owns this monitor. > 1470: assertIsCurrentThread(getEnv(), dbg_monitor->ownerThread); reading ownerThread (and other) fields without dbgRawMonitor is not MT-safe (it's not atomic) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586999913 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588259933 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586968454 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586970362 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586973059 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586983415 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586984623 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588206582 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1586976698 PR Review Comment: https://git.openjdk.org/jdk/pull/19044#discussion_r1588252771