Totally just a guess, but when I saw slow Process.Wait(), I was able to 
solve it by using syscall.wait4() directly per Ian's suggestion. Made a 
little library:

https://github.com/glycerine/bark

that will call
pid, err := syscall.Wait4(w.proc.Pid, &ws, syscall.WNOHANG, nil)


On Thursday, December 13, 2018 at 11:33:20 PM UTC-6, Rodrigo Toste Gomes 
wrote:
>
> I'm debugging an odd issue with Process.Wait, and wondering if anyone 
> would have any ideas / hit this before:
>
> I have a go process, which starts a child process (it sets DEATH_SIGNAL on 
> that process, not sure if it's relevant).
> Then, a background go routine runs Wait on that process in order to send a 
> signal to the main go routine which causes it to shutdown.
>
> A separate process sends `kill -9` to the child process, and then waits 
> until the go process exits.
> This separate process waits up to a minute before quitting, and erroring 
> out. We're reaching that timeout, and decided to look further into it.
>
> This is what we see:
> 1. The child process is a zombie
> 2. The go process is alive and well
> 3. The go process is wainting on the child process (we double checked the 
> pid)
> 4. It's stuck in this stack for that whole minute:
> ```
> goroutine 22 [syscall]:                                                    
>                                                                             
>                                                                             
>                                                        
> syscall.Syscall6(0xf7, 0x1, 0x136b, 0xc420034de8, 0x1000004, 0x0, 0x0, 
> 0x100090100000000, 0x0, 0x30)                                              
>                                                                             
>                                                             
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/syscall/asm_linux_amd64.s:44
>  
> +0x5 fp=0xc420034d90 sp=0xc420034d88 pc=0x4784e5                            
>                                                                             
>                            
> os.(*Process).blockUntilWaitable(0xc4200b4e10, 0xc420034ec8, 0x48e364, 
> 0x136b)                                                                    
>                                                                             
>                                                             
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/wait_waitid.go:31
>  
> +0xa5 fp=0xc420034e98 sp=0xc420034d90 pc=0x493775                          
>                                                                             
>                                     
> os.(*Process).wait(0xc4200b4e10, 0x0, 0x0, 0xc4200b4e10)                  
>                                                                             
>                                                                             
>                                                         
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/exec_unix.go:22
>  
> +0x42 fp=0xc420034f20 sp=0xc420034e98 pc=0x48dd12                          
>                                                                             
>                                       
> os.(*Process).Wait(0xc4200b4e10, 0xc4200b4e10, 0x0, 0x0)                  
>                                                                             
>                                                                             
>                                                         
>         
> /nix/store/yvw2v009phsdwj191jm4j2wsk28b2gxx-go-1.9.2/share/go/src/os/exec.go:115
>  
> +0x2b fp=0xc420034f50 sp=0xc420034f20 pc=0x48d33b                          
>                                                                             
>                                           
> ```
> 5. This happens rarely, and only happens when the system is under a lot of 
> stress, so our best guess is that the system is taking a long time to 
> respond to the syscall (which is odd because we were able to abort the go 
> process, and run ps aux without issues - it doesn't look like a completely 
> stalled system), or some other issue.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to