On Fri, 17 Jan 2025 18:52:41 +0900 Takashi Yano wrote: > On Wed, 8 Jan 2025 18:05:53 -0800 (PST) > Jeremy Drake wrote: > > On Thu, 9 Jan 2025, Takashi Yano wrote: > > > > > On Wed, 8 Jan 2025 16:48:41 +0100 > > > Corinna Vinschen wrote: > > > > Does this patch fix Bruno's bash issue as well? > > > > > > I'm not sure because it is not reproducible as he said. > > > I also could not reproduce that. > > > > > > However, at least this fixes the issue that Jeremy encountered: > > > https://cygwin.com/pipermail/cygwin/2024-December/256977.html > > > > > > But, even with this patch, Jeremy reported another hang issue > > > that also is not reproducible: > > > https://cygwin.com/pipermail/cygwin/2024-December/256987.html > > > > Yes, this patch helped the hangs I was seeing on Windows on ARM64. > > However, there is still some hang issue in 3.5.5 (which occurs on > > native x86_64) that is not there in 3.5.4. Git for Windows' test suite > > seems to be somewhat reliable at triggering this, but it's hardly a > > minimal test case ;). > > > > Because of this issue, MSYS2 has been keeping 3.5.5 in its 'staging' state > > (rather than deploying it to normal users), and Git for Windows rolled > > back to 3.5.4 before the release of the latest Git RC. > > I might have successfully reproduced this issue. I tried building > cygwin1.dll repeatedly for some of my machines, and one of them > hung in fhandler_pipe::raw_read() as lazka's case: > https://github.com/msys2/msys2-runtime/pull/251#issuecomment-2571338429 > > The call: > L358: waitret = cygwait (select_sem, select_sem_timeout); > never returned even with select_sem_timeout == 1 until a signal > (such as SIGTERM, SIGKILL) arrives. In this situation, attaching > gdb to the process hanging and issuing 'si' command do not return. > Something (stack?) seems to be completely broken. > > I'll try to bisect which commit causes this issue. Please wait > a while.
Done. This issue also seems to be related to the commit: commit d243e51ef1d30312ba1e21b4d25a1ca9a8dc1f63 Author: Takashi Yano <takashi.y...@nifty.ne.jp> Date: Mon Nov 25 19:51:53 2024 +0900 Cygwin: signal: Fix deadlock between main thread and sig thread Previously, a deadlock happened if many SIGSTOP/SIGCONT signals were received rapidly. If the main thread sends __SIGFLUSH at the timing when SIGSTOP is handled by the sig thread, but not is handled by the main thread yet (sig_handle_tty_stop() not called yet), and if SIGCONT is received, the sig thread waits for cygtls::current_sig (is SIGSTOP now) cleared. However, the main thread waits for the pack.wakeup using WaitForSingleObject(), so the main thread cannot handle SIGSTOP. This is the mechanism of the deadlock. This patch uses cygwait() instead of WaitForSingleObject() to be able to handle the pending SIGSTOP. Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html Fixes: 7759daa979c4 ("(sig_send): Fill out sigpacket structure to send to signal thread rather than racily sending separate packets.") Reported-by: Christian Franke <christian.fra...@t-online.de> Reviewed-by: Corinna Vinschen <cori...@vinschen.de> Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> Even though the reason why this issue happens is not clear at all, I perhaps found the solution for that. Applying the attached patch: 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch instead of previous v2 __SIGFLUSHFAST patch solves the both issues. However, strangely enough, the similar patch: ng-0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch which uses cygwait() instead of WF[SM]O does not solve the issue Jeremy reported. The reason is also unclear. What is the difference between cygwait() and WF[SM]O? I expected both patches work almost the same. The v2 __SIGFLUSHFAST patch also uses cygwait(), so the reason might be the same (the reason why we should use WF[SM]O rather than cygwait()). Corinna, any idea? I need some clue. While debugging this problem, I encountered another hang issue, which is fixed by: 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch If we are confident in the patch 0003, I think we should apply 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch 0002-Revert-Cygwin-signal-Do-not-handle-signal-when-__SIG.patch 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch 0004-Revert-Cygwin-signal-Fix-high-load-when-retrying-to-.patch for main branch and 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch for cygwin-3_5-branch. Corinna, what do you think? Jeremy, could you please apply the attached patches: 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch against cygwin-3_5-branch and test if these fix the issue? -- Takashi Yano <takashi.y...@nifty.ne.jp>
From 0330b182621c0dac851c5c5269af2ccd58060f31 Mon Sep 17 00:00:00 2001 From: Takashi Yano <takashi.y...@nifty.ne.jp> Date: Sat, 18 Jan 2025 19:03:23 +0900 Subject: [PATCH 1/4] Cygwin: signal: Avoid frequent tls lock/unlock for SIGCONT processing It seems that current _cygtls::handle_SIGCONT() code sometimes falls into a deadlock due to frequent TLS lock/unlock operation in the yield() loop. With this patch, the yield() in the wait loop for an existing signal is placed outside the TLS lock. Conversely, the yield() in the wait loop for a self-issued SIGCONT is placed inside the TLS lock and do not unlock every time yield() is called. Fixes: 9ae51bcc51a7 ("Cygwin: signal: Fix another deadlock between main and sig thread") Reviewed-by: Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> --- winsup/cygwin/exceptions.cc | 38 +++++++++++---------------- winsup/cygwin/local_includes/cygtls.h | 2 +- 2 files changed, 16 insertions(+), 24 deletions(-) diff --git a/winsup/cygwin/exceptions.cc b/winsup/cygwin/exceptions.cc index 4dc4be278..a36ef4998 100644 --- a/winsup/cygwin/exceptions.cc +++ b/winsup/cygwin/exceptions.cc @@ -1420,7 +1420,7 @@ api_fatal_debug () /* Attempt to carefully handle SIGCONT when we are stopped. */ void -_cygtls::handle_SIGCONT (threadlist_t * &tl_entry) +_cygtls::handle_SIGCONT () { if (NOTSTATE (myself, PID_STOPPED)) return; @@ -1431,28 +1431,24 @@ _cygtls::handle_SIGCONT (threadlist_t * &tl_entry) Make sure that any pending signal is handled before trying to send a new one. Then make sure that SIGCONT has been recognized before exiting the loop. */ - bool sigsent = false; - while (1) - if (current_sig) /* Assume that it's ok to just test sig outside of a - lock since setup_handler does it this way. */ - { - cygheap->unlock_tls (tl_entry); - yield (); /* Attempt to schedule another thread. */ - tl_entry = cygheap->find_tls (_main_tls); - } - else if (sigsent) - break; /* SIGCONT has been recognized by other thread */ - else - { - current_sig = SIGCONT; - set_signal_arrived (); /* alert sig_handle_tty_stop */ - sigsent = true; - } + while (current_sig) + yield (); + + threadlist_t *tl_entry = cygheap->find_tls (this); + + current_sig = SIGCONT; + set_signal_arrived (); /* alert sig_handle_tty_stop */ + + while (current_sig == SIGCONT) + yield (); + /* Clear pending stop signals */ sig_clear (SIGSTOP, false); sig_clear (SIGTSTP, false); sig_clear (SIGTTIN, false); sig_clear (SIGTTOU, false); + + cygheap->unlock_tls (tl_entry); } int @@ -1479,11 +1475,7 @@ sigpacket::process () myself->rusage_self.ru_nsignals++; if (si.si_signo == SIGCONT) - { - tl_entry = cygheap->find_tls (_main_tls); - _main_tls->handle_SIGCONT (tl_entry); - cygheap->unlock_tls (tl_entry); - } + _main_tls->handle_SIGCONT (); /* SIGKILL is special. It always goes through. */ if (si.si_signo == SIGKILL) diff --git a/winsup/cygwin/local_includes/cygtls.h b/winsup/cygwin/local_includes/cygtls.h index 2d490646a..da22e5d14 100644 --- a/winsup/cygwin/local_includes/cygtls.h +++ b/winsup/cygwin/local_includes/cygtls.h @@ -274,7 +274,7 @@ public: /* Do NOT remove this public: line, it's a marker for gentls_offsets. */ { will_wait_for_signal = false; } - void handle_SIGCONT (threadlist_t * &); + void handle_SIGCONT (); static void cleanup_early(struct _reent *); private: void call2 (DWORD (*) (void *, void *), void *, void *); -- 2.45.1
From faea6b05f55541ce60275f7e5d8db1347d648e0d Mon Sep 17 00:00:00 2001 From: Takashi Yano <takashi.y...@nifty.ne.jp> Date: Sat, 18 Jan 2025 19:19:56 +0900 Subject: [PATCH 2/4] Revert "Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent" This reverts commit a22a0ad7c4f0 to apply a new patch for the same purpose. Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> --- winsup/cygwin/release/3.5.6 | 3 --- winsup/cygwin/sigproc.cc | 20 +++++--------------- 2 files changed, 5 insertions(+), 18 deletions(-) diff --git a/winsup/cygwin/release/3.5.6 b/winsup/cygwin/release/3.5.6 index 0fff0de40..d17a6af53 100644 --- a/winsup/cygwin/release/3.5.6 +++ b/winsup/cygwin/release/3.5.6 @@ -7,6 +7,3 @@ Fixes: - Fix a regression since 3.5.0 which fails to use POSIX semantics in unlink/rename on NTFS. - -- Fix zsh hang at startup. - Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256954.html diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc index 35ec3e70e..ba7818a68 100644 --- a/winsup/cygwin/sigproc.cc +++ b/winsup/cygwin/sigproc.cc @@ -751,14 +751,10 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) res = WriteFile (sendsig, leader, packsize, &nb, NULL); if (!res || packsize == nb) break; - if (cygwait (NULL, 10, cw_sig_eintr) == WAIT_SIGNALED - && pack.si.si_signo != __SIGFLUSHFAST) + if (cygwait (NULL, 10, cw_sig_eintr) == WAIT_SIGNALED) _my_tls.call_signal_handler (); res = 0; } - /* Re-assert signal_arrived which has been cleared in cygwait(). */ - if (_my_tls.current_sig) - _my_tls.set_signal_arrived (); if (!res) { @@ -789,16 +785,7 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) if (wait_for_completion) { sigproc_printf ("Waiting for pack.wakeup %p", pack.wakeup); - do - { - rc = cygwait (pack.wakeup, WSSC, cw_sig_eintr); - if (rc == WAIT_SIGNALED && pack.si.si_signo != __SIGFLUSHFAST) - _my_tls.call_signal_handler (); - } - while (rc != WAIT_OBJECT_0 && rc != WAIT_TIMEOUT); - /* Re-assert signal_arrived which has been cleared in cygwait(). */ - if (_my_tls.current_sig) - _my_tls.set_signal_arrived (); + rc = cygwait (pack.wakeup, WSSC); ForceCloseHandle (pack.wakeup); } else @@ -819,6 +806,9 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) rc = -1; } + if (wait_for_completion && si.si_signo != __SIGFLUSHFAST) + _my_tls.call_signal_handler (); + out: if (communing && rc) { -- 2.45.1
From 72044bae562c40f59f20ec69ce8ec695aaff7a5d Mon Sep 17 00:00:00 2001 From: Takashi Yano <takashi.y...@nifty.ne.jp> Date: Sat, 18 Jan 2025 19:21:51 +0900 Subject: [PATCH 3/4] Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent The commit a22a0ad7c4f0 was not exactly the correct thing. Even with the patch, some hangs still happen. This patch overrides the previous commit to fix these hangs. Addresses: https://cygwin.com/pipermail/cygwin/2024-December/256987.html Fixes: a22a0ad7c4f0 ("Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent") Reported-by: Jeremy Drake <cyg...@jdrake.com> Reviewed-by: Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> --- winsup/cygwin/sigproc.cc | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc index ba7818a68..4dcdd94de 100644 --- a/winsup/cygwin/sigproc.cc +++ b/winsup/cygwin/sigproc.cc @@ -751,8 +751,19 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) res = WriteFile (sendsig, leader, packsize, &nb, NULL); if (!res || packsize == nb) break; - if (cygwait (NULL, 10, cw_sig_eintr) == WAIT_SIGNALED) - _my_tls.call_signal_handler (); + if (pack.si.si_signo == __SIGFLUSHFAST) + Sleep (10); + else /* Handle signals */ + do + { + DWORD rc = WaitForSingleObject (_my_tls.get_signal_arrived (), 10); + if (rc == WAIT_OBJECT_0) + { + _my_tls.call_signal_handler (); + continue; + } + } + while (false); res = 0; } @@ -785,7 +796,20 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) if (wait_for_completion) { sigproc_printf ("Waiting for pack.wakeup %p", pack.wakeup); - rc = cygwait (pack.wakeup, WSSC); + if (pack.si.si_signo == __SIGFLUSHFAST) + rc = WaitForSingleObject (pack.wakeup, WSSC); + else /* Handle signals */ + do + { + HANDLE w[2] = {pack.wakeup, _my_tls.get_signal_arrived ()}; + rc = WaitForMultipleObjects (2, w, FALSE, WSSC); + if (rc == WAIT_OBJECT_0 + 1) /* signal arrived */ + { + _my_tls.call_signal_handler (); + continue; + } + } + while (false); ForceCloseHandle (pack.wakeup); } else @@ -806,9 +830,6 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) rc = -1; } - if (wait_for_completion && si.si_signo != __SIGFLUSHFAST) - _my_tls.call_signal_handler (); - out: if (communing && rc) { -- 2.45.1
From d8e6b1f33979b9d7eb0960eca8dbcbf7d2c1f5d1 Mon Sep 17 00:00:00 2001 From: Takashi Yano <takashi.y...@nifty.ne.jp> Date: Sat, 18 Jan 2025 19:51:27 +0900 Subject: [PATCH 4/4] Revert "Cygwin: signal: Fix high load when retrying to process pending signal" This reverts commit 1d1451ccd2a6 because this does not seem to necessary anymore. After the commit "Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent", the problem does not happen even without this patch. Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> --- winsup/cygwin/sigproc.cc | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc index 4dcdd94de..68f84a848 100644 --- a/winsup/cygwin/sigproc.cc +++ b/winsup/cygwin/sigproc.cc @@ -1366,12 +1366,6 @@ wait_sig (VOID *) hntdll = GetModuleHandle ("ntdll.dll"); - /* GetTickCount() here is enough because GetTickCount() - t0 does - not overflow until 49 days psss. Even if GetTickCount() overflows, - GetTickCount() - t0 returns correct value, since underflow in - unsigned wraps correctly. Pending a signal for more than 49 - days makes no sense. */ - DWORD t0 = GetTickCount (); for (;;) { DWORD nb; @@ -1381,7 +1375,7 @@ wait_sig (VOID *) else if (sigq.start.next && PeekNamedPipe (my_readsig, NULL, 0, NULL, &nb, NULL) && !nb) { - Sleep (GetTickCount () - t0 > 10 ? 1 : 0); + yield (); pack.si.si_signo = __SIGFLUSH; } else if (!ReadFile (my_readsig, &pack, sizeof (pack), &nb, NULL)) @@ -1391,8 +1385,6 @@ wait_sig (VOID *) system_printf ("garbled signal pipe data nb %u, sig %d", nb, pack.si.si_signo); continue; } - if (pack.si.si_signo != __SIGFLUSH) - t0 = GetTickCount (); sigq.retry = false; /* Don't process signals when we start exiting */ -- 2.45.1
From 70a69950b21cb5f1f73e5750e25f43aa25b8bf18 Mon Sep 17 00:00:00 2001 From: Takashi Yano <takashi.y...@nifty.ne.jp> Date: Sat, 18 Jan 2025 19:39:26 +0900 Subject: [PATCH 3/3] Cygwin: signal: Do not handle signal when __SIGFLUSHFAST is sent THIS PATCH DOES NOT FIX THE ISSUE THAT JEREMY REPORTED IN: https://cygwin.com/pipermail/cygwin/2024-December/256987.html FOR UNKNOWN REASON. Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> --- winsup/cygwin/sigproc.cc | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/winsup/cygwin/sigproc.cc b/winsup/cygwin/sigproc.cc index ba7818a68..41854b114 100644 --- a/winsup/cygwin/sigproc.cc +++ b/winsup/cygwin/sigproc.cc @@ -751,8 +751,10 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) res = WriteFile (sendsig, leader, packsize, &nb, NULL); if (!res || packsize == nb) break; - if (cygwait (NULL, 10, cw_sig_eintr) == WAIT_SIGNALED) - _my_tls.call_signal_handler (); + if (pack.si.si_signo == __SIGFLUSHFAST) + Sleep (10); + else /* Handle signals in cygwait() */ + cygwait (NULL, 10, cw_sig_restart); res = 0; } @@ -785,7 +787,10 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) if (wait_for_completion) { sigproc_printf ("Waiting for pack.wakeup %p", pack.wakeup); - rc = cygwait (pack.wakeup, WSSC); + if (pack.si.si_signo == __SIGFLUSHFAST) + rc = WaitForSingleObject (pack.wakeup, WSSC); + else /* Handle signals in cygwait() */ + rc = cygwait (pack.wakeup, WSSC, cw_sig_restart); ForceCloseHandle (pack.wakeup); } else @@ -806,9 +811,6 @@ sig_send (_pinfo *p, siginfo_t& si, _cygtls *tls) rc = -1; } - if (wait_for_completion && si.si_signo != __SIGFLUSHFAST) - _my_tls.call_signal_handler (); - out: if (communing && rc) { -- 2.45.1