On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee <jver...@openjdk.org> wrote:

> There is a subtle race in `UpcallLinker::on_exit` between copying of the old 
> frame anchor back into place, and the GC. Since this copy is not atomic, it 
> may briefly appear as if a thread has no last Java frame, while still in the 
> `_thread_in_native` state, which leads to the GC skipping processing of any 
> active Java frames.
> 
> This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - 
> the JNI mechanism for upcalls - but in that case the frame anchor copy 
> happens in the `_thread_in_vm` state, which means the GC will wait for the 
> thread to get to a safepoint.
> 
> The solution proposed here is to do the frame anchor copy in the java thread 
> state, before transitioning  back to the native state. The java thread state, 
> like the vm thread state, is also 'safe' i.e. the GC will wait for the thread 
> to get to a safepoint, so we can safely do our non-atomic copy of the frame 
> anchor.
> 
> Additionally, this PR resolves a similar issue in `on_entry`, by moving the 
> clearing of the pending exception (in case native code use a JNI API and 
> didn't handle the exception afterwards). We now also skip checking for async 
> exceptions when transitioning from native to java, so we don't immediately 
> clear them. Any async exceptions will be picked up at the next safepoint 
> instead.
> 
> Special thanks to @stefank and @fisk for finding the root cause, and 
> @jaikiran for testing and debugging.
> 
> Testing: tier 1-4, 20k runs of the failing test on linux-aarch64.

lgtm.

Would be nice if if we could assert that we are not in native or blocked when 
touching the oops as well. Similarly to modifications of the frame anchor. But 
I agree that it should be done separately.

-------------

Marked as reviewed by aboldtch (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21742#pullrequestreview-2464545413

Reply via email to