On Thu, 11 Jul 2024 01:57:33 GMT, David Holmes <dhol...@openjdk.org> wrote:

>> I believe we've done quite a few short sleeps like this in the past. Scaling 
>> is not really an issue. It should only require at most a few ms, even with 
>> -Xcomp, so we wait 1000ms and then never have to think about the timing 
>> again. This test is ugly. Sometimes you have to fight ugly with ugly.
>
>> mainThead is suspended and is holding a println related lock, and thread2 is 
>> stuck on the 2nd log call in runt2 awaiting the same lock.
> 
> The classic example of why suspension is fraught with peril - the target must 
> be guaranteed not to be holding any resource needed by the suspender. I think 
> removing the logging may be the best bet here - with comments in the code to 
> ensure someone does not add it back. Or else use a more primitive (native?) 
> mechanism to do the logging, not  System.out.println().

Which logging should be removed? Alex suggest the logging between breakpoints 2 
and 3, but even that is not enough. There is logging after breakpoint 3, and 
that happens before the vm.resume() is done. I'm not saying this can't be done 
right, but I think pruning out logging like this in order to fix the problem 
not only removes some valuable logging from the test, but still leaves us open 
to this type of issue.

I think the safer thing to do is to make sure mainThread is no longer logging 
(or will attempt to log) when the vmsuspend is done. This could be done by 
pruning some of its logging, but that has the same negatives as removing some 
thread2 logging. My sleep suggestion is by far the simplest. The "purist" fix 
would probably be the checkpoint approach (don't do the vm.suspend until 
mainThread has reached a stable point). That ended up getting a bit uglier than 
I had hoped, but I can finish up the work so you can have a look at it if you'd 
like.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20088#discussion_r1673391953

Reply via email to