On Thu, 6 Apr 2023 03:35:17 GMT, Chris Plummer <cjplum...@openjdk.org> wrote:

>> The real purpose of this PR is to add virtual thread support to 
>> ThreadMemoryLeakTest.java, but this exposed bugs in both the debug agent and 
>> in TestScaffold, so those are being fixed also (and the debug agent bug is 
>> the CR being used).
>> 
>> The debug agent bug is due to a race condition during VM exit. The VM is in 
>> the process of shutting down. The debug agent has already disabled JVMTI 
>> callbacks and has sent the VMDeathEvent. At this point in time there are 
>> also threads exiting that the debug agent knows about, but it will not get a 
>> ThreadEndEvent for because of the callbacks being disabled. Thus these 
>> threads remain in the debug agent's list of known threads, even though they 
>> have exited. The debuggee receives the VMDeathEvent and does a VM.resume(). 
>> During the debug agent's handing of the VM.Resume command, it iterates over 
>> all known threads and needs to map each to its ThreadNode so it can be 
>> resumed, and this mapping requires accessing the JVMTI TLS for the thread. 
>> The problem is some of the threads may have exited already, and therefore no 
>> longer have TLS. This results in the assert in the debug agent. This debug 
>> agent issue was already addressed for platform threads, but not for virtual 
>> threads, which is why w
 e started seeing this issue when this test was modified. The fix is to just 
replicate what is done for platform threads for virtual threads also.
>> 
>> The TestScaffold bug is that if the debuggee crashes/asserts, this is likely 
>> to go unnoticed, especially if it happens during VM exit (and the test 
>> essentially has already completed). Because of this TestScaffold bug, the 
>> debug agent bug above did not result in a test failure. After fixing 
>> TestScaffold to check the exitCode of the debuggee process, the test started 
>> to appropriately fail until the debug agent was fixed.
>> 
>> One other thing to point out is the OOME issue I started getting frequently 
>> when testing with virtual threads. Since virtual threads are created at a 
>> much higher rate than platform threads, their creation started to overwhelm 
>> the debugger (actually the JDI implementation). There is already a mechanism 
>> in place to do a VM.HoldEvents if JDI has queue up 10,000 events. The 
>> problem is that events are coming in so fast that even after doing the 
>> VM.HoldEvents, the number of queued events continues to go up for a while, 
>> and sometimes reaches 30,000 or more. This raises the peak memory usage of 
>> the test quite a bit. Since the test purposely uses a small heap so a memory 
>> leak is quickly and reliably detected, the large queue often results in an 
>> OOME. Because of this I make virtual threads sleep for 100ms instead of 50ms 
>> to slow down their creation, and this resolved the issue. 
>> 
>> I tested by running all of test/jdk/com/sun/jdi 25 times on each platform 
>> with and without virtual thread testing enabled.
>
> Chris Plummer has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   TestScaffold now waits indefinitely for process exit. Simpler coding of 
> sleep time.

Marked as reviewed by sspitsyn (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/13246#pullrequestreview-1374133992

Reply via email to