On Mon, Mar 21, 2016 at 3:35 PM, Oleg Nesterov <o...@redhat.com> wrote: > On 03/21, Patrick Donnelly wrote: >> On Mon, Mar 21, 2016 at 3:07 PM, Oleg Nesterov <o...@redhat.com> wrote: >> > case SIGSTOP: >> > /* Black magic to get threads working on old Linux kernels... */ >> > >> > if(p->nsyscalls == 0) { /* stop before we begin running the >> > process */ >> > debug(D_DEBUG, "suppressing bootstrap SIGSTOP for %d",pid); >> > signum = 0; /* suppress delivery */ >> > kill(p->pid,SIGCONT); >> > } >> > break; >> > >> > doesn't look right. Note that kill(pid,SIGCONT) affects the whole thread- >> > group. So if this kill() races with another thread doing clone() you can >> > hit the problem you described. >> >> You're right, that should be tkill! I will give that a try and report >> back if that solved the issue for our collaborators... > > Ah, sorry, I should have mentioned this... > > No, tkill() won't help. See prepare_signal(), SIGCONT always removes > the SIG_KERNEL_STOP_MASK signals from all threads, not matter if it was > sent by tkill() or kill(). > > Perhaps you should just remove this kill(SIGCONT) ? > > tracer_continue(signr => 0) should equally suppress the delivery. To > clarify this won't be right too, but without PTRACE_SEIZE you simply > can't write the code which handles the stop/cont/etc events correctly > anyway...
Thanks so much Oleg. Indeed this was the problem. -- Patrick Donnelly