Re: RFR: 8305209: JDWP exit error AGENT_ERROR_INVALID_THREAD(203): missing entry in running thread table [v3]

Chris Plummer Wed, 05 Apr 2023 20:35:45 -0700

> The real purpose of this PR is to add virtual thread support to 
> ThreadMemoryLeakTest.java, but this exposed bugs in both the debug agent and 
> in TestScaffold, so those are being fixed also (and the debug agent bug is 
> the CR being used).
> 
> The debug agent bug is due to a race condition during VM exit. The VM is in 
> the process of shutting down. The debug agent has already disabled JVMTI 
> callbacks and has sent the VMDeathEvent. At this point in time there are also 
> threads exiting that the debug agent knows about, but it will not get a 
> ThreadEndEvent for because of the callbacks being disabled. Thus these 
> threads remain in the debug agent's list of known threads, even though they 
> have exited. The debuggee receives the VMDeathEvent and does a VM.resume(). 
> During the debug agent's handing of the VM.Resume command, it iterates over 
> all known threads and needs to map each to its ThreadNode so it can be 
> resumed, and this mapping requires accessing the JVMTI TLS for the thread. 
> The problem is some of the threads may have exited already, and therefore no 
> longer have TLS. This results in the assert in the debug agent. This debug 
> agent issue was already addressed for platform threads, but not for virtual 
> threads, which is why we
  started seeing this issue when this test was modified. The fix is to just 
replicate what is done for platform threads for virtual threads also.
> 
> The TestScaffold bug is that if the debuggee crashes/asserts, this is likely 
> to go unnoticed, especially if it happens during VM exit (and the test 
> essentially has already completed). Because of this TestScaffold bug, the 
> debug agent bug above did not result in a test failure. After fixing 
> TestScaffold to check the exitCode of the debuggee process, the test started 
> to appropriately fail until the debug agent was fixed.
> 
> One other thing to point out is the OOME issue I started getting frequently 
> when testing with virtual threads. Since virtual threads are created at a 
> much higher rate than platform threads, their creation started to overwhelm 
> the debugger (actually the JDI implementation). There is already a mechanism 
> in place to do a VM.HoldEvents if JDI has queue up 10,000 events. The problem 
> is that events are coming in so fast that even after doing the VM.HoldEvents, 
> the number of queued events continues to go up for a while, and sometimes 
> reaches 30,000 or more. This raises the peak memory usage of the test quite a 
> bit. Since the test purposely uses a small heap so a memory leak is quickly 
> and reliably detected, the large queue often results in an OOME. Because of 
> this I make virtual threads sleep for 100ms instead of 50ms to slow down 
> their creation, and this resolved the issue. 
> 
> I tested by running all of test/jdk/com/sun/jdi 25 times on each platform 
> with and without virtual thread testing enabled.


Chris Plummer has updated the pull request incrementally with one additional 
commit since the last revision:

  TestScaffold now waits indefinitely for process exit. Simpler coding of sleep 
time.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/13246/files
  - new: https://git.openjdk.org/jdk/pull/13246/files/28a337d1..7b02bef2

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13246&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13246&range=01-02

  Stats: 19 lines in 2 files changed: 0 ins; 7 del; 12 mod
  Patch: https://git.openjdk.org/jdk/pull/13246.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13246/head:pull/13246

PR: https://git.openjdk.org/jdk/pull/13246

Re: RFR: 8305209: JDWP exit error AGENT_ERROR_INVALID_THREAD(203): missing entry in running thread table [v3]

Reply via email to