On Sun, Jul 5, 2020 at 10:54 AM Marcin Romaszewicz <marc...@gmail.com> wrote: > > I'm hitting a problem using os.exec Cmd.Start to run a process. > > I'm setting Cmd.Stdio and Cmd.Stderr to the same instance of an io.Pipe, and > spawn a Goroutine to consume the pipe reader until I reach EOF. I then call > cmd.Start(), do some additional work, and call cmd.Wait(). The runtime of the > executable I launch is 15-30 minutes, and stdout/stderr output is minimal, a > few 10's of kB during this 15-30 minute run. > > When the pipe reaches EOF or errors out, I close the pipe reader, exit the > goroutine reading the pipe, and that's when cmd.Wait() returns, exactly as > documented. > > This works exactly as described about 70% of the time. The remaining 30% of > the time, cmd.Wait() returns an error, which stringifies as "signal: broken > pipe". I'm running thousands of copies of this executable across thousands of > instances in AWS, so I have a big data set here. The broken pipe error > happens at the very end when my exec'd executable is exiting, so as far as I > can tell, it's run successfully and is hitting this error on exit. > > I realize that SIGPIPE and EPIPE are common ways that processes clean each > other up, and that shells do a lot of work hiding them, so I've also tried > using exec.Cmd to spawn bash, which in turn runs my executable, but I still > get a lot of these deaths due to SIGPIPE. > > I've tried to reproduce this with simple commands - like `cat > <longfile.txt>`, and none of these simple commands ever result in the broken > pipe, and I capture all their output without issue. The command I'm running > differs in that it uses quite a lot of resources and the machine is doing > significant work when the executable is exiting. However, the sigpipe is > being received by the application, not my Go code, implying that the Go side > is closing the pipe. I can't find where this is happening. > > Any tips on how to chase this down?
The executable is dying due to receiving a SIGPIPE signal. As you know, that means that it made a write system call to a pipe that had no open readers. If you're confident that you are reading all the data from the pipe in the Go program, then the natural first thing to check is the other possible pipe: if you are reading from stdout, check what happens on stderr, and vice-versa. Since that probably won't help, since you can reproduce it with some reliability, try running the whole system under strace -f. That will show you the system calls both of your program and of the subprocess, and should let you determine exactly which write is triggering the SIGPIPE, and let you verify that the read end of the pipe has been closed. And if that doesn't help, perhaps you can modify the subprocess to catch SIGPIPE and get a stack trace, again with the goal of finding out exactly what write is failing. Hope this helps. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcU7NOn468-LVA4fSKURSRZ_H1pTMQ%3DVKLtFFwJr4bmm2A%40mail.gmail.com.