On Fri, 7 Mar 2025 16:29:51 +0900
Takashi Yano wrote:
> On Wed, 5 Mar 2025 11:23:26 +0100
> Christian Franke wrote:
> > Takashi Yano via Cygwin wrote:
> > > On Mon, 24 Feb 2025 11:29:59 +0100
> > > Christian Franke wrote:
> > >> Found with 'stress-ng --cpu-sched 1':
> > >>
> > >> Testcase (attached):
> > >>
> > >> $ uname -r
> > >> 3.6.0-0.387.g8cebbb2b42bf.x86_64
> > >>
> > >> $ gcc -o timersig timersig.c
> > >>
> > >> $ ./timersig
> > >> 638: fork()=639
> > >> !!!!!!!!!!!!!...!!!!!!!!!!!!!SIGSTOP: Permission denied
> > >>       0 [itimer] timersig 639 sig_send: error sending signal 14, pid 639,
> > >> pipe handle 0x14C, nb 0, packsize 192, Win32 error 0
> > >> SIGKILL: Permission denied
> > >>
> > >> $ kill 639
> > >> -bash: kill: (639) - Permission denied
> > >>
> > >> $ kill -9 639
> > >> -bash: kill: (639) - Permission denied
> > >>
> > >> $ /bin/kill --force 639
> > >>
> > >> $ /bin/kill --force 639
> > >> kill: 639: No such process
> > >>
> > >>
> > >> A similar problem, but without the "error sending signal" message,
> > >> occurs if the timer is not used but the parent process issues SIGSTOP
> > >> SIGALRM SIGCONT ... sequences.
> > > Thanks for the report, especially for the test case. I was able to
> > > easily reproduce the issue. However, I haven't found the cause until
> > > today. I spent 3 days investigating and discovered three bugs that
> > > prevent the test case from behaving as expected.
> > >
> > > I'll submit the patch seriese shotly.
> > 
> > Testcase works as expected with 3.6.0-0.419.g3c1308ed890e.x86_64, thanks!
> > 
> > 
> > Unfortunately signals may be lost, a new testcase is attached:
> > 
> > $ uname -r
> > 3.6.0-0.419.g3c1308ed890e.x86_64
> > 
> > $ gcc -o lostsig lostsig.c
> > 
> > $ ./lostsig
> > 1157: fork()=1158
> > SIGALRM x 10
> > [ALRM]
> > [ALRM]
> > [ALRM]
> > SIGSTOP
> > [ALRM]
> > SIGTERM
> > SIGCONT
> > waitpid()...
> > [TERM]
> > 1158: 4 SIGALRM received, exit(42)
> > waidpid()=1158, status=0x2a00
> > 
> > $ ./lostsig
> > 1163: fork()=1164
> > SIGALRM x 10
> > SIGSTOP
> > SIGTERM
> > SIGCONT
> > waitpid()...
> > [ALRM]
> > [TERM]
> > ...hangs...
> > 
> > 
> > A 'ps' is a second terminal then shows that the child process is still 
> > in S)topped state. 'kill -CONT ...' works to continue.
> > 
> > If the testcase is assigned to a single core with 'taskset 0x1 ...', it 
> > apparently always hangs.
> 
> Thanks for the report and the testcase.
> The current implementation of the signal queue has the following problems:
> 1) Signals in the queue are processed in a disordered manner.
> 2) If the same signal is already in the queue, new signal is discarded.
> 
> I am working on this issue and almost finished.
> 
> Now I'm testing. Please wait a while.

BTW, the resut of your testcase in Linux is as follows:

231873: fork()=231874
SIGALRM x 10
[ALRM]
[ALRM]
[ALRM]
SIGSTOP
SIGTERM
SIGCONT
waitpid()...
[TERM]
231874: 3 SIGALRM received, exit(42)
waidpid()=231874, status=0x2a00

Signal-lost also happens. However, it does not hang in Linux.
So I guess SIGSTOP/SIGCONT are never lost in Linux.

-- 
Takashi Yano <takashi.y...@nifty.ne.jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to