On Fri, 7 Mar 2025 16:29:51 +0900 Takashi Yano wrote: > On Wed, 5 Mar 2025 11:23:26 +0100 > Christian Franke wrote: > > Takashi Yano via Cygwin wrote: > > > On Mon, 24 Feb 2025 11:29:59 +0100 > > > Christian Franke wrote: > > >> Found with 'stress-ng --cpu-sched 1': > > >> > > >> Testcase (attached): > > >> > > >> $ uname -r > > >> 3.6.0-0.387.g8cebbb2b42bf.x86_64 > > >> > > >> $ gcc -o timersig timersig.c > > >> > > >> $ ./timersig > > >> 638: fork()=639 > > >> !!!!!!!!!!!!!...!!!!!!!!!!!!!SIGSTOP: Permission denied > > >> 0 [itimer] timersig 639 sig_send: error sending signal 14, pid 639, > > >> pipe handle 0x14C, nb 0, packsize 192, Win32 error 0 > > >> SIGKILL: Permission denied > > >> > > >> $ kill 639 > > >> -bash: kill: (639) - Permission denied > > >> > > >> $ kill -9 639 > > >> -bash: kill: (639) - Permission denied > > >> > > >> $ /bin/kill --force 639 > > >> > > >> $ /bin/kill --force 639 > > >> kill: 639: No such process > > >> > > >> > > >> A similar problem, but without the "error sending signal" message, > > >> occurs if the timer is not used but the parent process issues SIGSTOP > > >> SIGALRM SIGCONT ... sequences. > > > Thanks for the report, especially for the test case. I was able to > > > easily reproduce the issue. However, I haven't found the cause until > > > today. I spent 3 days investigating and discovered three bugs that > > > prevent the test case from behaving as expected. > > > > > > I'll submit the patch seriese shotly. > > > > Testcase works as expected with 3.6.0-0.419.g3c1308ed890e.x86_64, thanks! > > > > > > Unfortunately signals may be lost, a new testcase is attached: > > > > $ uname -r > > 3.6.0-0.419.g3c1308ed890e.x86_64 > > > > $ gcc -o lostsig lostsig.c > > > > $ ./lostsig > > 1157: fork()=1158 > > SIGALRM x 10 > > [ALRM] > > [ALRM] > > [ALRM] > > SIGSTOP > > [ALRM] > > SIGTERM > > SIGCONT > > waitpid()... > > [TERM] > > 1158: 4 SIGALRM received, exit(42) > > waidpid()=1158, status=0x2a00 > > > > $ ./lostsig > > 1163: fork()=1164 > > SIGALRM x 10 > > SIGSTOP > > SIGTERM > > SIGCONT > > waitpid()... > > [ALRM] > > [TERM] > > ...hangs... > > > > > > A 'ps' is a second terminal then shows that the child process is still > > in S)topped state. 'kill -CONT ...' works to continue. > > > > If the testcase is assigned to a single core with 'taskset 0x1 ...', it > > apparently always hangs. > > Thanks for the report and the testcase. > The current implementation of the signal queue has the following problems: > 1) Signals in the queue are processed in a disordered manner. > 2) If the same signal is already in the queue, new signal is discarded. > > I am working on this issue and almost finished. > > Now I'm testing. Please wait a while.
BTW, the resut of your testcase in Linux is as follows: 231873: fork()=231874 SIGALRM x 10 [ALRM] [ALRM] [ALRM] SIGSTOP SIGTERM SIGCONT waitpid()... [TERM] 231874: 3 SIGALRM received, exit(42) waidpid()=231874, status=0x2a00 Signal-lost also happens. However, it does not hang in Linux. So I guess SIGSTOP/SIGCONT are never lost in Linux. -- Takashi Yano <takashi.y...@nifty.ne.jp> -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple