On Wed, 5 Mar 2025 11:23:26 +0100
Christian Franke wrote:
> Takashi Yano via Cygwin wrote:
> > On Mon, 24 Feb 2025 11:29:59 +0100
> > Christian Franke wrote:
> >> Found with 'stress-ng --cpu-sched 1':
> >>
> >> Testcase (attached):
> >>
> >> $ uname -r
> >> 3.6.0-0.387.g8cebbb2b42bf.x86_64
> >>
> >> $ gcc -o timersig timersig.c
> >>
> >> $ ./timersig
> >> 638: fork()=639
> >> !!!!!!!!!!!!!...!!!!!!!!!!!!!SIGSTOP: Permission denied
> >>       0 [itimer] timersig 639 sig_send: error sending signal 14, pid 639,
> >> pipe handle 0x14C, nb 0, packsize 192, Win32 error 0
> >> SIGKILL: Permission denied
> >>
> >> $ kill 639
> >> -bash: kill: (639) - Permission denied
> >>
> >> $ kill -9 639
> >> -bash: kill: (639) - Permission denied
> >>
> >> $ /bin/kill --force 639
> >>
> >> $ /bin/kill --force 639
> >> kill: 639: No such process
> >>
> >>
> >> A similar problem, but without the "error sending signal" message,
> >> occurs if the timer is not used but the parent process issues SIGSTOP
> >> SIGALRM SIGCONT ... sequences.
> > Thanks for the report, especially for the test case. I was able to
> > easily reproduce the issue. However, I haven't found the cause until
> > today. I spent 3 days investigating and discovered three bugs that
> > prevent the test case from behaving as expected.
> >
> > I'll submit the patch seriese shotly.
> 
> Testcase works as expected with 3.6.0-0.419.g3c1308ed890e.x86_64, thanks!
> 
> 
> Unfortunately signals may be lost, a new testcase is attached:
> 
> $ uname -r
> 3.6.0-0.419.g3c1308ed890e.x86_64
> 
> $ gcc -o lostsig lostsig.c
> 
> $ ./lostsig
> 1157: fork()=1158
> SIGALRM x 10
> [ALRM]
> [ALRM]
> [ALRM]
> SIGSTOP
> [ALRM]
> SIGTERM
> SIGCONT
> waitpid()...
> [TERM]
> 1158: 4 SIGALRM received, exit(42)
> waidpid()=1158, status=0x2a00
> 
> $ ./lostsig
> 1163: fork()=1164
> SIGALRM x 10
> SIGSTOP
> SIGTERM
> SIGCONT
> waitpid()...
> [ALRM]
> [TERM]
> ...hangs...
> 
> 
> A 'ps' is a second terminal then shows that the child process is still 
> in S)topped state. 'kill -CONT ...' works to continue.
> 
> If the testcase is assigned to a single core with 'taskset 0x1 ...', it 
> apparently always hangs.

Thanks for the report and the testcase.
The current implementation of the signal queue has the following problems:
1) Signals in the queue are processed in a disordered manner.
2) If the same signal is already in the queue, new signal is discarded.

I am working on this issue and almost finished.

Now I'm testing. Please wait a while.

-- 
Takashi Yano <takashi.y...@nifty.ne.jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to