On Wed, 5 Mar 2025 11:23:26 +0100 Christian Franke wrote: > Takashi Yano via Cygwin wrote: > > On Mon, 24 Feb 2025 11:29:59 +0100 > > Christian Franke wrote: > >> Found with 'stress-ng --cpu-sched 1': > >> > >> Testcase (attached): > >> > >> $ uname -r > >> 3.6.0-0.387.g8cebbb2b42bf.x86_64 > >> > >> $ gcc -o timersig timersig.c > >> > >> $ ./timersig > >> 638: fork()=639 > >> !!!!!!!!!!!!!...!!!!!!!!!!!!!SIGSTOP: Permission denied > >> 0 [itimer] timersig 639 sig_send: error sending signal 14, pid 639, > >> pipe handle 0x14C, nb 0, packsize 192, Win32 error 0 > >> SIGKILL: Permission denied > >> > >> $ kill 639 > >> -bash: kill: (639) - Permission denied > >> > >> $ kill -9 639 > >> -bash: kill: (639) - Permission denied > >> > >> $ /bin/kill --force 639 > >> > >> $ /bin/kill --force 639 > >> kill: 639: No such process > >> > >> > >> A similar problem, but without the "error sending signal" message, > >> occurs if the timer is not used but the parent process issues SIGSTOP > >> SIGALRM SIGCONT ... sequences. > > Thanks for the report, especially for the test case. I was able to > > easily reproduce the issue. However, I haven't found the cause until > > today. I spent 3 days investigating and discovered three bugs that > > prevent the test case from behaving as expected. > > > > I'll submit the patch seriese shotly. > > Testcase works as expected with 3.6.0-0.419.g3c1308ed890e.x86_64, thanks! > > > Unfortunately signals may be lost, a new testcase is attached: > > $ uname -r > 3.6.0-0.419.g3c1308ed890e.x86_64 > > $ gcc -o lostsig lostsig.c > > $ ./lostsig > 1157: fork()=1158 > SIGALRM x 10 > [ALRM] > [ALRM] > [ALRM] > SIGSTOP > [ALRM] > SIGTERM > SIGCONT > waitpid()... > [TERM] > 1158: 4 SIGALRM received, exit(42) > waidpid()=1158, status=0x2a00 > > $ ./lostsig > 1163: fork()=1164 > SIGALRM x 10 > SIGSTOP > SIGTERM > SIGCONT > waitpid()... > [ALRM] > [TERM] > ...hangs... > > > A 'ps' is a second terminal then shows that the child process is still > in S)topped state. 'kill -CONT ...' works to continue. > > If the testcase is assigned to a single core with 'taskset 0x1 ...', it > apparently always hangs.
Thanks for the report and the testcase. The current implementation of the signal queue has the following problems: 1) Signals in the queue are processed in a disordered manner. 2) If the same signal is already in the queue, new signal is discarded. I am working on this issue and almost finished. Now I'm testing. Please wait a while. -- Takashi Yano <takashi.y...@nifty.ne.jp> -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple