On Sat, 18 Jan 2025 20:41:37 +0900 Takashi Yano wrote: > On Fri, 17 Jan 2025 18:52:41 +0900 > Takashi Yano wrote: > > On Wed, 8 Jan 2025 18:05:53 -0800 (PST) > > Jeremy Drake wrote: > > > On Thu, 9 Jan 2025, Takashi Yano wrote: > > > > > > > On Wed, 8 Jan 2025 16:48:41 +0100 > > > > Corinna Vinschen wrote: > > > > > Does this patch fix Bruno's bash issue as well? > > > > > > > > I'm not sure because it is not reproducible as he said. > > > > I also could not reproduce that. > > > > > > > > However, at least this fixes the issue that Jeremy encountered: > > > > https://cygwin.com/pipermail/cygwin/2024-December/256977.html > > > > > > > > But, even with this patch, Jeremy reported another hang issue > > > > that also is not reproducible: > > > > https://cygwin.com/pipermail/cygwin/2024-December/256987.html > > > > > > Yes, this patch helped the hangs I was seeing on Windows on ARM64. > > > However, there is still some hang issue in 3.5.5 (which occurs on > > > native x86_64) that is not there in 3.5.4. Git for Windows' test suite > > > seems to be somewhat reliable at triggering this, but it's hardly a > > > minimal test case ;). > > > > > > Because of this issue, MSYS2 has been keeping 3.5.5 in its 'staging' state > > > (rather than deploying it to normal users), and Git for Windows rolled > > > back to 3.5.4 before the release of the latest Git RC. > > > > I might have successfully reproduced this issue. I tried building > > cygwin1.dll repeatedly for some of my machines, and one of them > > hung in fhandler_pipe::raw_read() as lazka's case: > > https://github.com/msys2/msys2-runtime/pull/251#issuecomment-2571338429 > > > > The call: > > L358: waitret = cygwait (select_sem, select_sem_timeout); > > never returned even with select_sem_timeout == 1 until a signal > > (such as SIGTERM, SIGKILL) arrives. In this situation, attaching > > gdb to the process hanging and issuing 'si' command do not return. > > Something (stack?) seems to be completely broken. > > > > I'll try to bisect which commit causes this issue. Please wait > > a while. > > Done. > > This issue also seems to be related to the commit: > > commit d243e51ef1d30312ba1e21b4d25a1ca9a8dc1f63 > Author: Takashi Yano <takashi.y...@nifty.ne.jp> > Date: Mon Nov 25 19:51:53 2024 +0900 > > Cygwin: signal: Fix deadlock between main thread and sig thread > > Previously, a deadlock happened if many SIGSTOP/SIGCONT signals were > received rapidly. If the main thread sends __SIGFLUSH at the timing > when SIGSTOP is handled by the sig thread, but not is handled by the > main thread yet (sig_handle_tty_stop() not called yet), and if SIGCONT > is received, the sig thread waits for cygtls::current_sig (is SIGSTOP > now) cleared. However, the main thread waits for the pack.wakeup using > WaitForSingleObject(), so the main thread cannot handle SIGSTOP. This > is the mechanism of the deadlock. This patch uses cygwait() instead of > WaitForSingleObject() to be able to handle the pending SIGSTOP. > > Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html > Fixes: 7759daa979c4 ("(sig_send): Fill out sigpacket structure to send to > signal thread rather than racily sending separate packets.") > Reported-by: Christian Franke <christian.fra...@t-online.de> > Reviewed-by: Corinna Vinschen <cori...@vinschen.de> > Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> > > Even though the reason why this issue happens is not clear at all, > I perhaps found the solution for that. > > Applying the attached patch: > 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch > instead of previous v2 __SIGFLUSHFAST patch solves the both issues. > > However, strangely enough, the similar patch: > ng-0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch > which uses cygwait() instead of WF[SM]O does not solve the issue > Jeremy reported. > > The reason is also unclear. What is the difference between cygwait() > and WF[SM]O? I expected both patches work almost the same. The v2 > __SIGFLUSHFAST patch also uses cygwait(), so the reason might be > the same (the reason why we should use WF[SM]O rather than cygwait()). > > Corinna, any idea? I need some clue. > > > While debugging this problem, I encountered another hang issue, > which is fixed by: > 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch > > If we are confident in the patch 0003, I think we should apply > 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch > 0002-Revert-Cygwin-signal-Do-not-handle-signal-when-__SIG.patch > 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch > 0004-Revert-Cygwin-signal-Fix-high-load-when-retrying-to-.patch > for main branch and > 0001-Cygwin-signal-Avoid-frequent-tls-lock-unlock-for-SIG.patch > 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch > for cygwin-3_5-branch.
I am testing with applying 0001, 0002 and 0003 patches against main branch for three hang issues, https://cygwin.com/pipermail/cygwin/2024-November/256744.html https://cygwin.com/pipermail/cygwin/2024-December/256954.html https://cygwin.com/pipermail/cygwin/2024-December/256987.html that could be reproduced in my environment related to the patch: "Cygwin: signal: Fix deadlock between main thread and sig thread" I haven't seen any hangs in the tree test cases repeatedly running for more than 12 hours so far. -- Takashi Yano <takashi.y...@nifty.ne.jp>