Hi Jeremy,

On Tue, 12 Nov 2024, Jeremy Drake via Cygwin-patches wrote:

> From: Jeremy Drake <cyg...@jdrake.com>
>
> This addresses an extremely difficult to debug deadlock when running
> under emulation on ARM64.
>
> A relatively easy way to trigger this bug is to call `fork()`, then within the
> child process immediately call another `fork()` and then `exit()` the
> intermediate process.
>
> It would seem that there is a "code emulation" lock on the wait thread at
> this stage, and if the thread is terminated too early, that lock still exists
> albeit without a thread, and nothing moves forward.
>
> It seems that a `SuspendThread()` combined with a `GetThreadContext()`
> (to force the thread to _actually_ be suspended, for more details see
> https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743)
> makes sure the thread is "booted" from emulation before it is suspended.
>
> Hopefully this means it won't be holding any locks or otherwise leave
> emulation in a bad state when the thread is terminated.
>
> Also, attempt to use `CancelSynchonousIo()` (as seen in `flock.cc`) to avoid
> the need for `TerminateThread()` altogether.  This doesn't always work,
> however, so was not a complete fix for the deadlock issue.
>
> Addresses: https://cygwin.com/pipermail/cygwin-developers/2024-May/012694.html
> Signed-off-by: Jeremy Drake <cyg...@jdrake.com>

Excellent work! Thank you for your impressive tenacity to stick with this
problem. I built an MSYS2 runtime with the fix in
https://github.com/git-for-windows/msys2-runtime/pull/73, and then started
your reproducer from
https://inbox.sourceware.org/cygwin-developers/78f294de-4c94-242a-722e-fd98e51ed...@jdrake.com/,
and it failed to dead-lock so far (it's been running for almost an hour).

In other words, I would like to offer my support for including this in
Cygwin so that these dead-locks on Windows/ARM64 will be a thing of the
past.

Thank you so, so much,
Johannes

> ---
>  winsup/cygwin/cygthread.cc | 14 ++++++++++++++
>  winsup/cygwin/sigproc.cc   |  3 ++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/winsup/cygwin/cygthread.cc b/winsup/cygwin/cygthread.cc
> index 54918e7677..4f16097531 100644
> --- a/winsup/cygwin/cygthread.cc
> +++ b/winsup/cygwin/cygthread.cc
> @@ -302,6 +302,20 @@ cygthread::terminate_thread ()
>    if (!inuse)
>      goto force_notterminated;
>
> +  if (_my_tls._ctinfo != this)
> +    {
> +      CONTEXT context;
> +      context.ContextFlags = CONTEXT_CONTROL;
> +      /* SuspendThread makes sure a thread is "booted" from emulation before
> +      it is suspended.  As such, the emulator hopefully won't be in a bad
> +      state (aka, holding any locks) when the thread is terminated. */
> +      SuspendThread (h);
> +      /* We need to call GetThreadContext, even though we don't care about 
> the
> +      context, because SuspendThread is asynchronous and GetThreadContext
> +      will make sure the thread is *really* suspended before returning */
> +      GetThreadContext (h, &context);
> +    }
> +
>    TerminateThread (h, 0);
>    WaitForSingleObject (h, INFINITE);
>    CloseHandle (h);
> diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc
> index 81b6c31695..360bdac232 100644
> --- a/winsup/cygwin/sigproc.cc
> +++ b/winsup/cygwin/sigproc.cc
> @@ -410,7 +410,8 @@ proc_terminate ()
>         if (!have_execed || !have_execed_cygwin)
>           chld_procs[i]->ppid = 1;
>         if (chld_procs[i].wait_thread)
> -         chld_procs[i].wait_thread->terminate_thread ();
> +         if (!CancelSynchronousIo (chld_procs[i].wait_thread->thread_handle 
> ()))
> +           chld_procs[i].wait_thread->terminate_thread ();
>         /* Release memory associated with this process unless it is 'myself'.
>            'myself' is only in the chld_procs table when we've execed.  We
>            reach here when the next process has finished initializing but we
> --
> 2.47.0.windows.2
>
>

Reply via email to