On Fri, 28 Oct 2022 00:52:31 GMT, Serguei Spitsyn <sspit...@openjdk.org> wrote:

>> This PR disables VIRTUAL_THREAD_START/END events when no debugger is 
>> attached. Some customers like to launch (in production mode) the JVM with 
>> the debug agent enabled, but with no debugger attached. This usually has a 
>> very minimal performance impact as long as the debugger remains unattached. 
>> Launching the JVM in this manner allows a debugger to later be attached if 
>> there are issues.
>> 
>> In the past the debug agent has always kept THREAD_START/END events enabled, 
>> even when no debugger is attached. It does so to keep track of all existing 
>> threads, and this generally has minimal performance impact. However, with 
>> virtual threads there can be a very large number of virtual threads 
>> constantly being created and destroyed. With SkyNet, this results in as much 
>> as a 250x performance regression when the debug agent is enabled. A large 
>> part of this is due to all the VIRTUAL_THREAD_START/END events the debug 
>> agent has to deal with. When these events are disabled, the performance 
>> slowdown is more like 10x. This remaining 10x comes from JVMTI and is not 
>> addressed in this CR.
>> 
>> The fix for this performance issue is to disable VIRTUAL_THREAD_START/END 
>> events when the debugger is not attached. This also requires forgetting 
>> about all known virtual threads when the debugger detaches since there will 
>> be no VIRTUAL_THREAD_END event to notify the debug agent when the virtual 
>> thread has died.
>> 
>> I decided it was appropriate to not introduce this behavior change if the 
>> debug agent is launched with `includevirtualthreads=y` (the default is 'n'). 
>> `includevirtualthreads=y` is meant to help legacy debuggers, and indicates 
>> that the debug agent should try to track (remember) all vthreads that are 
>> running, and return them when `VirtualMachine.GetAllThreads` is called. 
>> `includevirtualthreads=y` sets `gdata->includeVThreads` true, so initially I 
>> had these changes conditional on `gdata->includeVThreads`. However, most of 
>> our testing is done with `includevirtualthreads=y` (an unfortunate necessity 
>> because of how most of the test are written). This made it hard to test 
>> these changes. So I introduced `gdata->rememberVThreadsWhenDisconnected`, 
>> and made the changes conditional on it instead. It defaults to be the same 
>> as `gdata->includeVThreads`. However, I did some testing with it manually 
>> set it to always be false to get better testing coverage. This could be made 
>> a debug agent flag se
 ttable on the command line, but I opted not to since it's not really of 
interest to the typical debug agent user. I suppose it could be an undocumented 
flag just to use in testing. I can make this change if others agree.
>> 
>> A couple notes on code flow that might help with understanding the changes:
>> - When the connection is initiated, `debugLoop_run()` is called, and it 
>> calls `eventHandler_onConnect()` to enable the START/END callbacks.
>> - When the connection is dropped, `debugLoop_run()` calls 
>> `debugInit_reset()`, which calls `eventHandler_reset()` to disable the 
>> START/END callbacks. A bit later `debugInit_reset()` calls 
>> `threadControl_reset()` to free up the vthread ThreadNodes.
>
> src/jdk.jdwp.agent/share/native/libjdwp/eventHandler.c line 1684:
> 
>> 1682:     }
>> 1683:     debugMonitorExit(callbackLock);
>> 1684:     waiting_for_active_callbacks = JNI_FALSE;
> 
> The set operations at lines 1678 and 1684 are not protected by any lock.
> Would it be more safe to put them inside critical section at lines 1679-1683?
> The variable `waiting_for_active_callbacks` is checked by the event callbacks.

Only the "JDWP Command Reader" thread can ever enter this code. So we have one 
writer but multiple readers. The readers all have to grab the callbackLock 
before reading `waiting_for_active_callbacks`, and also before touching 
`active_callbacks`. It is possible that the flag can be set and then 
`END_CALLBACK()` does the notification before the above code enters the lock 
and waits. But that's ok because it also means that `active_callbacks` is set 
to 0, so no wait will be done. If another event thread races and sets it back 
up to 1, then that is also ok because that means END_CALLBACK() will be invoked 
again and do the notification again.

However, there also seems to be no reason not to move the code withing the 
critical section, so I can do that if you'd like. The reason it ended up 
outside is because I used to have a different flag (I think it was called 
`gdata->resetting`) that was set in `debugInit_reset()` (a couple of frames 
above this one), but I decided it would be easier to understand if I limited 
the scope of when the flag was set, and also give it a name that implied a 
narrower scope of use. However, I maintained the same semantics of setting it 
outside the lock.

-------------

PR: https://git.openjdk.org/jdk/pull/10865

Reply via email to