After way too much time, I figured it out, after a quote from [this post](http://fixunix.com/questions/379652-sending-eof-named-pipe.html) jumped out at me:
> See the "I/O on Pipes and FIFOs" section of pipe(7) ("man 7 pipe") > > "If all file descriptors referring to the write end of a pipe have > been closed, then an attempt to read(2) from the pipe will see > end-of-file (read(2) will return 0)." I should've known this, but it never occurred to me - had nothing to do with Python in particular. What was happening was: the subprocesses were getting forked with open (writer) file descriptors to each others' pipes. As long as there are open writer file descriptors to a pipe, readers won't see EOF. E.g.: p1=Popen(..., stdin=PIPE, ...) # creates a pipe the parent process can write to p2=Popen(...) # inherits the writer FD - as long as p2 exists, p1 won't see EOF Turns out there's a `close_fds` parameter to `Popen`, so the solution is to pass `close_fds=True`. All simple and obvious in hindsight, but still managed to cost at least a couple eyeballs good chunks of time. On Sun, Feb 13, 2011 at 11:52 PM, Yang Zhang <yanghates...@gmail.com> wrote: > Anybody else see this issue? > > On Thu, Feb 10, 2011 at 10:37 AM, Yang Zhang <yanghates...@gmail.com> wrote: >> On Thu, Feb 10, 2011 at 12:28 AM, Jean-Michel Pichavant >> <jeanmic...@sequans.com> wrote: >>> Yang Zhang wrote: >>>> >>>> On Wed, Feb 9, 2011 at 11:01 AM, MRAB <pyt...@mrabarnett.plus.com> wrote: >>>> >>>>> >>>>> On 09/02/2011 01:59, Yang Zhang wrote: >>>>> >>>>>> >>>>>> I reduced a problem I was seeing in my application down into the >>>>>> following test case. In this code, a parent process concurrently >>>>>> spawns 2 (you can spawn more) subprocesses that read a big message >>>>>> from the parent over stdin, sleep for 5 seconds, and write something >>>>>> back. However, there's unexpected waiting happening somewhere, causing >>>>>> the code to complete in 10 seconds instead of the expected 5. >>>>>> >>>>>> If you set `verbose=True`, you can see that the straggling subprocess >>>>>> is receiving most of the messages, then waiting for the last chunk of >>>>>> 3 chars---it's not detecting that the pipe has been closed. >>>>>> Furthermore, if I simply don't do anything with the second process >>>>>> (`doreturn=True`), the first process will *never* see the EOF. >>>>>> >>>>>> Any ideas what's happening? Further down is some example output. >>>>>> Thanks in advance. >>>>>> >>>>>> from subprocess import * >>>>>> from threading import * >>>>>> from time import * >>>>>> from traceback import * >>>>>> import sys >>>>>> verbose = False >>>>>> doreturn = False >>>>>> msg = (20*4096+3)*'a' >>>>>> def elapsed(): return '%7.3f' % (time() - start) >>>>>> if sys.argv[1:]: >>>>>> start = float(sys.argv[2]) >>>>>> if verbose: >>>>>> for chunk in iter(lambda: sys.stdin.read(4096), ''): >>>>>> print>> sys.stderr, '..', time(), sys.argv[1], 'read', >>>>>> len(chunk) >>>>>> else: >>>>>> sys.stdin.read() >>>>>> print>> sys.stderr, elapsed(), '..', sys.argv[1], 'done reading' >>>>>> sleep(5) >>>>>> print msg >>>>>> else: >>>>>> start = time() >>>>>> def go(i): >>>>>> print elapsed(), i, 'starting' >>>>>> p = Popen(['python','stuckproc.py',str(i), str(start)], >>>>>> stdin=PIPE, stdout=PIPE) >>>>>> if doreturn and i == 1: return >>>>>> print elapsed(), i, 'writing' >>>>>> p.stdin.write(msg) >>>>>> print elapsed(), i, 'closing' >>>>>> p.stdin.close() >>>>>> print elapsed(), i, 'reading' >>>>>> p.stdout.read() >>>>>> print elapsed(), i, 'done' >>>>>> ts = [Thread(target=go, args=(i,)) for i in xrange(2)] >>>>>> for t in ts: t.start() >>>>>> for t in ts: t.join() >>>>>> >>>>>> Example output: >>>>>> >>>>>> 0.001 0 starting >>>>>> 0.003 1 starting >>>>>> 0.005 0 writing >>>>>> 0.016 1 writing >>>>>> 0.093 0 closing >>>>>> 0.093 0 reading >>>>>> 0.094 1 closing >>>>>> 0.094 1 reading >>>>>> 0.098 .. 1 done reading >>>>>> 5.103 1 done >>>>>> 5.108 .. 0 done reading >>>>>> 10.113 0 done >>>>>> >>>>>> >>>>> >>>>> I changed 'python' to the path of python.exe and 'stuckproc.py' to its >>>>> full path and tried it with Python 2.7 on Windows XP Pro. It worked as >>>>> expected. >>>>> >>>> >>>> Good point - I didn't specify that I'm seeing this on Linux (Ubuntu >>>> 10.04's Python 2.6). >>>> >>>> >>> >>> python test.py 0.000 0 starting >>> 0.026 0 writing >>> 0.026 0 closing >>> 0.026 0 reading >>> 0.029 .. 0 done reading >>> 0.030 1 starting >>> 0.038 1 writing >>> 0.058 1 closing >>> 0.058 1 reading >>> 0.061 .. 1 done reading >>> 5.026 0 done >>> 5.061 1 done >>> >>> on debian lenny (Python 2.5) >>> >>> JM >>> >> >> FWIW, this is consistently reproduce-able across all the Ubuntu 10.04s >> I've tried. You may need to increase the message size so that it's >> large enough for your system. >> >> -- >> Yang Zhang >> http://yz.mit.edu/ >> > > > > -- > Yang Zhang > http://yz.mit.edu/ > -- Yang Zhang http://yz.mit.edu/ -- http://mail.python.org/mailman/listinfo/python-list