On Sat, 18 Jan 2025 20:41:37 +0900 Takashi Yano wrote: > On Fri, 17 Jan 2025 18:52:41 +0900 > Takashi Yano wrote: > > On Wed, 8 Jan 2025 18:05:53 -0800 (PST) > > Jeremy Drake wrote: > > > On Thu, 9 Jan 2025, Takashi Yano wrote: > > > > > > > On Wed, 8 Jan 2025 16:48:41 +0100 > > > > Corinna Vinschen wrote: > > > > > Does this patch fix Bruno's bash issue as well? > > > > > > > > I'm not sure because it is not reproducible as he said. > > > > I also could not reproduce that. > > > > > > > > However, at least this fixes the issue that Jeremy encountered: > > > > https://cygwin.com/pipermail/cygwin/2024-December/256977.html > > > > > > > > But, even with this patch, Jeremy reported another hang issue > > > > that also is not reproducible: > > > > https://cygwin.com/pipermail/cygwin/2024-December/256987.html > > > > > > Yes, this patch helped the hangs I was seeing on Windows on ARM64. > > > However, there is still some hang issue in 3.5.5 (which occurs on > > > native x86_64) that is not there in 3.5.4. Git for Windows' test suite > > > seems to be somewhat reliable at triggering this, but it's hardly a > > > minimal test case ;). > > > > > > Because of this issue, MSYS2 has been keeping 3.5.5 in its 'staging' state > > > (rather than deploying it to normal users), and Git for Windows rolled > > > back to 3.5.4 before the release of the latest Git RC. > > > > I might have successfully reproduced this issue. I tried building > > cygwin1.dll repeatedly for some of my machines, and one of them > > hung in fhandler_pipe::raw_read() as lazka's case: > > https://github.com/msys2/msys2-runtime/pull/251#issuecomment-2571338429 > > > > The call: > > L358: waitret = cygwait (select_sem, select_sem_timeout); > > never returned even with select_sem_timeout == 1 until a signal > > (such as SIGTERM, SIGKILL) arrives. In this situation, attaching > > gdb to the process hanging and issuing 'si' command do not return. > > Something (stack?) seems to be completely broken. > > > > I'll try to bisect which commit causes this issue. Please wait > > a while. > > Done. > > This issue also seems to be related to the commit: > > commit d243e51ef1d30312ba1e21b4d25a1ca9a8dc1f63 > Author: Takashi Yano <takashi.y...@nifty.ne.jp> > Date: Mon Nov 25 19:51:53 2024 +0900 > > Cygwin: signal: Fix deadlock between main thread and sig thread > > Previously, a deadlock happened if many SIGSTOP/SIGCONT signals were > received rapidly. If the main thread sends __SIGFLUSH at the timing > when SIGSTOP is handled by the sig thread, but not is handled by the > main thread yet (sig_handle_tty_stop() not called yet), and if SIGCONT > is received, the sig thread waits for cygtls::current_sig (is SIGSTOP > now) cleared. However, the main thread waits for the pack.wakeup using > WaitForSingleObject(), so the main thread cannot handle SIGSTOP. This > is the mechanism of the deadlock. This patch uses cygwait() instead of > WaitForSingleObject() to be able to handle the pending SIGSTOP. > > Addresses: https://cygwin.com/pipermail/cygwin/2024-November/256744.html > Fixes: 7759daa979c4 ("(sig_send): Fill out sigpacket structure to send to > signal thread rather than racily sending separate packets.") > Reported-by: Christian Franke <christian.fra...@t-online.de> > Reviewed-by: Corinna Vinschen <cori...@vinschen.de> > Signed-off-by: Takashi Yano <takashi.y...@nifty.ne.jp> > > Even though the reason why this issue happens is not clear at all, > I perhaps found the solution for that. > > Applying the attached patch: > 0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch > instead of previous v2 __SIGFLUSHFAST patch solves the both issues. > > However, strangely enough, the similar patch: > ng-0003-Cygwin-signal-Do-not-handle-signal-when-__SIGFLUSHFA.patch > which uses cygwait() instead of WF[SM]O does not solve the issue > Jeremy reported. > > The reason is also unclear. What is the difference between cygwait() > and WF[SM]O? I expected both patches work almost the same. The v2 > __SIGFLUSHFAST patch also uses cygwait(), so the reason might be > the same (the reason why we should use WF[SM]O rather than cygwait()). > > Corinna, any idea? I need some clue.
I might have understood. cygwait() is not reentrant in the same thread due to cw_timer which is in TLS, is it? If the signal handler is called during cygwait(), then kill() is invoked within the signal handler, cygwait() will be called recursively if it is used in sig_send(). This might destroy cw_timer... However, I wonder if cw_timer is re-set by NtSetTimer() in the cygwait(), it will be set to WSSC (60 sec) (or 10msec) in the sig_send(), so the hang should end with in at most 60 sec unlike the the hang Jeremy reported. I should still overlook something. -- Takashi Yano <takashi.y...@nifty.ne.jp>