On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo 
<pchilanom...@openjdk.org> wrote:

>> The following patch fixes a bug introduced while refactoring the 
>> VirtualThreadStart/End events. Specifically, the code to delete the 
>> JvmtiThreadState of a terminating vthread was moved before we start the VTMS 
>> transition. That allowed said code to run concurrently with 
>> recompute_enabled() leading to different crashing modes. I wrote the 
>> detailed sequence of events leading to the crash in the bug comments. 
>> 
>> To fix it I moved the cleanup code back after the call to 
>> VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to 
>> that of the carrier has to be done after this cleanup code is executed, 
>> otherwise we would delete the wrong JvmtiThreadState state, I had to add a 
>> boolean argument to VTMS_unmount_begin() to differentiate the last unmount 
>> call from the other ones. This is unfortunate since ideally 
>> VTMS_unmount_begin() would be oblivious to these two cases as with 
>> VTMS_mount_end() where we don't need to check if this is the first mount.
>> I looked for other ways to solve it instead of the extra boolean argument 
>> but wasn't convinced. One way would be to have another 
>> JvmtiExport::cleanup_thread() that would handle this case. Another way which 
>> is very simple is to move the rebind_to_jvmti_thread_state_of() call to 
>> VTMS_unmount_end() instead. But that means during the transition the 
>> _jvmti_thread_state field of the carrier would be either null or that of the 
>> vthread, unlike today which is always that of the carrier during the 
>> transitions. I didn't want to change that behavior in this fix but I can 
>> also explore that route.
>> 
>> I tested the patch with the reproducer I attached to the bug, plus I also 
>> run tiers1-3 in mach5.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   remove extra assert

Summary: Thumbs up. This is most definitely NOT a trivial fix.

Normally the way I like to review these kinds of fixes is to map the failure
modes back to the fix just to make sure I understand how each of the
failure modes is covered by the changes. For this particular bug that has
NOT been an easy thing to do. I believe the failure modes are complicated
to follow because we "lost" the synchronization of being in the
`VTMS_unmount_begin` transition which allowed a thread calling
`recompute_enabled()` to race with our ending/exiting vthread/cthread
combo. Ouch.

The fix does restore the synchronization of being in the
`VTMS_unmount_begin` transition so I can see how we would no longer
be racing with the `recompute_enabled()` calling thread.

-------------

Marked as reviewed by dcubed (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1429333688

Reply via email to