On Thu, Dec 13, 2018 at 9:33 PM <rodrigo.toste.go...@gmail.com> wrote:
>
> I'm debugging an odd issue with Process.Wait, and wondering if anyone would 
> have any ideas / hit this before:
>
> I have a go process, which starts a child process (it sets DEATH_SIGNAL on 
> that process, not sure if it's relevant).
> Then, a background go routine runs Wait on that process in order to send a 
> signal to the main go routine which causes it to shutdown.
>
> A separate process sends `kill -9` to the child process, and then waits until 
> the go process exits.
> This separate process waits up to a minute before quitting, and erroring out. 
> We're reaching that timeout, and decided to look further into it.
>
> This is what we see:
> 1. The child process is a zombie
> 2. The go process is alive and well
> 3. The go process is wainting on the child process (we double checked the pid)
> 4. It's stuck in this stack for that whole minute:
> ```
> goroutine 22 [syscall]:
> syscall.Syscall6(0xf7, 0x1, 0x136b, 0xc420034de8, 0x1000004, 0x0, 0x0, 
> 0x100090100000000, 0x0, 0x30)
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/syscall/asm_linux_amd64.s:44
>  +0x5 fp=0xc420034d90 sp=0xc420034d88 pc=0x4784e5
> os.(*Process).blockUntilWaitable(0xc4200b4e10, 0xc420034ec8, 0x48e364, 0x136b)
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/wait_waitid.go:31
>  +0xa5 fp=0xc420034e98 sp=0xc420034d90 pc=0x493775
> os.(*Process).wait(0xc4200b4e10, 0x0, 0x0, 0xc4200b4e10)
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/exec_unix.go:22
>  +0x42 fp=0xc420034f20 sp=0xc420034e98 pc=0x48dd12
> os.(*Process).Wait(0xc4200b4e10, 0xc4200b4e10, 0x0, 0x0)
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/exec.go:115
>  +0x2b fp=0xc420034f50 sp=0xc420034f20 pc=0x48d33b
> ```
> 5. This happens rarely, and only happens when the system is under a lot of 
> stress, so our best guess is that the system is taking a long time to respond 
> to the syscall (which is odd because we were able to abort the go process, 
> and run ps aux without issues - it doesn't look like a completely stalled 
> system), or some other issue.


See if you can recreate the problem while running the program under
`strace -f`.  That will help show whether the delay is in the program
or the kernel.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to