Re: lost signal (was: cygwin 3.6.0: Signals may fail permanently if received after SIGSTOP)

Roland Mainz via Cygwin Fri, 07 Mar 2025 01:11:24 -0800

On Fri, Mar 7, 2025 at 9:01 AM Takashi Yano via Cygwin
<cygwin@cygwin.com> wrote:
>
> On Fri, 7 Mar 2025 16:29:51 +0900
> Takashi Yano wrote:
> > On Wed, 5 Mar 2025 11:23:26 +0100
> > Christian Franke wrote:
> > > Takashi Yano via Cygwin wrote:
> > > > On Mon, 24 Feb 2025 11:29:59 +0100
> > > > Christian Franke wrote:
> > > >> Found with 'stress-ng --cpu-sched 1':
> > > >>
> > > >> Testcase (attached):
> > > >>
> > > >> $ uname -r
> > > >> 3.6.0-0.387.g8cebbb2b42bf.x86_64
> > > >>
> > > >> $ gcc -o timersig timersig.c
> > > >>
> > > >> $ ./timersig
> > > >> 638: fork()=639
> > > >> !!!!!!!!!!!!!...!!!!!!!!!!!!!SIGSTOP: Permission denied
> > > >>       0 [itimer] timersig 639 sig_send: error sending signal 14, pid 
> > > >> 639,
> > > >> pipe handle 0x14C, nb 0, packsize 192, Win32 error 0
> > > >> SIGKILL: Permission denied
> > > >>
> > > >> $ kill 639
> > > >> -bash: kill: (639) - Permission denied
> > > >>
> > > >> $ kill -9 639
> > > >> -bash: kill: (639) - Permission denied
> > > >>
> > > >> $ /bin/kill --force 639
> > > >>
> > > >> $ /bin/kill --force 639
> > > >> kill: 639: No such process
> > > >>
> > > >>
> > > >> A similar problem, but without the "error sending signal" message,
> > > >> occurs if the timer is not used but the parent process issues SIGSTOP
> > > >> SIGALRM SIGCONT ... sequences.
> > > > Thanks for the report, especially for the test case. I was able to
> > > > easily reproduce the issue. However, I haven't found the cause until
> > > > today. I spent 3 days investigating and discovered three bugs that
> > > > prevent the test case from behaving as expected.
> > > >
> > > > I'll submit the patch seriese shotly.
> > >
> > > Testcase works as expected with 3.6.0-0.419.g3c1308ed890e.x86_64, thanks!
> > >
> > >
> > > Unfortunately signals may be lost, a new testcase is attached:
> > >
> > > $ uname -r
> > > 3.6.0-0.419.g3c1308ed890e.x86_64
> > >
> > > $ gcc -o lostsig lostsig.c
> > >
> > > $ ./lostsig
> > > 1157: fork()=1158
> > > SIGALRM x 10
> > > [ALRM]
> > > [ALRM]
> > > [ALRM]
> > > SIGSTOP
> > > [ALRM]
> > > SIGTERM
> > > SIGCONT
> > > waitpid()...
> > > [TERM]
> > > 1158: 4 SIGALRM received, exit(42)
> > > waidpid()=1158, status=0x2a00
> > >
> > > $ ./lostsig
> > > 1163: fork()=1164
> > > SIGALRM x 10
> > > SIGSTOP
> > > SIGTERM
> > > SIGCONT
> > > waitpid()...
> > > [ALRM]
> > > [TERM]
> > > ...hangs...
> > >
> > >
> > > A 'ps' is a second terminal then shows that the child process is still
> > > in S)topped state. 'kill -CONT ...' works to continue.
> > >
> > > If the testcase is assigned to a single core with 'taskset 0x1 ...', it
> > > apparently always hangs.
> >
> > Thanks for the report and the testcase.
> > The current implementation of the signal queue has the following problems:
> > 1) Signals in the queue are processed in a disordered manner.
> > 2) If the same signal is already in the queue, new signal is discarded.
> >
> > I am working on this issue and almost finished.
> >
> > Now I'm testing. Please wait a while.
>
> BTW, the resut of your testcase in Linux is as follows:
>
> 231873: fork()=231874
> SIGALRM x 10
> [ALRM]
> [ALRM]
> [ALRM]
> SIGSTOP
> SIGTERM
> SIGCONT
> waitpid()...
> [TERM]
> 231874: 3 SIGALRM received, exit(42)
> waidpid()=231874, status=0x2a00
>
> Signal-lost also happens. However, it does not hang in Linux.


BTW: If you do testing PLEASE use |sigqueue()| for |SIGRT*| signals
(and check the return code!) and NOOT |kill()|, because |kill()|
cannot communicate if there was no room left to queue another signal.

> So I guess SIGSTOP/SIGCONT are never lost in Linux.

Traditionally |SIGSTOP|/|SIGCONT| are a special case and bound to the
kernel memory used to maintain the target process.

----

Bye
Roland
-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: lost signal (was: cygwin 3.6.0: Signals may fail permanently if received after SIGSTOP)

Reply via email to