On Thu, Dec 13, 2018 at 9:33 PM <rodrigo.toste.go...@gmail.com> wrote: > > I'm debugging an odd issue with Process.Wait, and wondering if anyone would > have any ideas / hit this before: > > I have a go process, which starts a child process (it sets DEATH_SIGNAL on > that process, not sure if it's relevant). > Then, a background go routine runs Wait on that process in order to send a > signal to the main go routine which causes it to shutdown. > > A separate process sends `kill -9` to the child process, and then waits until > the go process exits. > This separate process waits up to a minute before quitting, and erroring out. > We're reaching that timeout, and decided to look further into it. > > This is what we see: > 1. The child process is a zombie > 2. The go process is alive and well > 3. The go process is wainting on the child process (we double checked the pid) > 4. It's stuck in this stack for that whole minute: > ``` > goroutine 22 [syscall]: > syscall.Syscall6(0xf7, 0x1, 0x136b, 0xc420034de8, 0x1000004, 0x0, 0x0, > 0x100090100000000, 0x0, 0x30) > > /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/syscall/asm_linux_amd64.s:44 > +0x5 fp=0xc420034d90 sp=0xc420034d88 pc=0x4784e5 > os.(*Process).blockUntilWaitable(0xc4200b4e10, 0xc420034ec8, 0x48e364, 0x136b) > > /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/wait_waitid.go:31 > +0xa5 fp=0xc420034e98 sp=0xc420034d90 pc=0x493775 > os.(*Process).wait(0xc4200b4e10, 0x0, 0x0, 0xc4200b4e10) > > /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/exec_unix.go:22 > +0x42 fp=0xc420034f20 sp=0xc420034e98 pc=0x48dd12 > os.(*Process).Wait(0xc4200b4e10, 0xc4200b4e10, 0x0, 0x0) > > /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/exec.go:115 > +0x2b fp=0xc420034f50 sp=0xc420034f20 pc=0x48d33b > ``` > 5. This happens rarely, and only happens when the system is under a lot of > stress, so our best guess is that the system is taking a long time to respond > to the syscall (which is odd because we were able to abort the go process, > and run ps aux without issues - it doesn't look like a completely stalled > system), or some other issue.
See if you can recreate the problem while running the program under `strace -f`. That will help show whether the delay is in the program or the kernel. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.