On Fri, Mar 7, 2025 at 9:01 AM Takashi Yano via Cygwin <cygwin@cygwin.com> wrote: > > On Fri, 7 Mar 2025 16:29:51 +0900 > Takashi Yano wrote: > > On Wed, 5 Mar 2025 11:23:26 +0100 > > Christian Franke wrote: > > > Takashi Yano via Cygwin wrote: > > > > On Mon, 24 Feb 2025 11:29:59 +0100 > > > > Christian Franke wrote: > > > >> Found with 'stress-ng --cpu-sched 1': > > > >> > > > >> Testcase (attached): > > > >> > > > >> $ uname -r > > > >> 3.6.0-0.387.g8cebbb2b42bf.x86_64 > > > >> > > > >> $ gcc -o timersig timersig.c > > > >> > > > >> $ ./timersig > > > >> 638: fork()=639 > > > >> !!!!!!!!!!!!!...!!!!!!!!!!!!!SIGSTOP: Permission denied > > > >> 0 [itimer] timersig 639 sig_send: error sending signal 14, pid > > > >> 639, > > > >> pipe handle 0x14C, nb 0, packsize 192, Win32 error 0 > > > >> SIGKILL: Permission denied > > > >> > > > >> $ kill 639 > > > >> -bash: kill: (639) - Permission denied > > > >> > > > >> $ kill -9 639 > > > >> -bash: kill: (639) - Permission denied > > > >> > > > >> $ /bin/kill --force 639 > > > >> > > > >> $ /bin/kill --force 639 > > > >> kill: 639: No such process > > > >> > > > >> > > > >> A similar problem, but without the "error sending signal" message, > > > >> occurs if the timer is not used but the parent process issues SIGSTOP > > > >> SIGALRM SIGCONT ... sequences. > > > > Thanks for the report, especially for the test case. I was able to > > > > easily reproduce the issue. However, I haven't found the cause until > > > > today. I spent 3 days investigating and discovered three bugs that > > > > prevent the test case from behaving as expected. > > > > > > > > I'll submit the patch seriese shotly. > > > > > > Testcase works as expected with 3.6.0-0.419.g3c1308ed890e.x86_64, thanks! > > > > > > > > > Unfortunately signals may be lost, a new testcase is attached: > > > > > > $ uname -r > > > 3.6.0-0.419.g3c1308ed890e.x86_64 > > > > > > $ gcc -o lostsig lostsig.c > > > > > > $ ./lostsig > > > 1157: fork()=1158 > > > SIGALRM x 10 > > > [ALRM] > > > [ALRM] > > > [ALRM] > > > SIGSTOP > > > [ALRM] > > > SIGTERM > > > SIGCONT > > > waitpid()... > > > [TERM] > > > 1158: 4 SIGALRM received, exit(42) > > > waidpid()=1158, status=0x2a00 > > > > > > $ ./lostsig > > > 1163: fork()=1164 > > > SIGALRM x 10 > > > SIGSTOP > > > SIGTERM > > > SIGCONT > > > waitpid()... > > > [ALRM] > > > [TERM] > > > ...hangs... > > > > > > > > > A 'ps' is a second terminal then shows that the child process is still > > > in S)topped state. 'kill -CONT ...' works to continue. > > > > > > If the testcase is assigned to a single core with 'taskset 0x1 ...', it > > > apparently always hangs. > > > > Thanks for the report and the testcase. > > The current implementation of the signal queue has the following problems: > > 1) Signals in the queue are processed in a disordered manner. > > 2) If the same signal is already in the queue, new signal is discarded. > > > > I am working on this issue and almost finished. > > > > Now I'm testing. Please wait a while. > > BTW, the resut of your testcase in Linux is as follows: > > 231873: fork()=231874 > SIGALRM x 10 > [ALRM] > [ALRM] > [ALRM] > SIGSTOP > SIGTERM > SIGCONT > waitpid()... > [TERM] > 231874: 3 SIGALRM received, exit(42) > waidpid()=231874, status=0x2a00 > > Signal-lost also happens. However, it does not hang in Linux.
BTW: If you do testing PLEASE use |sigqueue()| for |SIGRT*| signals (and check the return code!) and NOOT |kill()|, because |kill()| cannot communicate if there was no room left to queue another signal. > So I guess SIGSTOP/SIGCONT are never lost in Linux. Traditionally |SIGSTOP|/|SIGCONT| are a special case and bound to the kernel memory used to maintain the target process. ---- Bye Roland -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple