On 5/9/23, Thomas Passin <li...@tompassin.net> wrote: > > I'm not sure if this exactly fits your situation, but if you use > subprocess with pipes, you can often get a deadlock because the stdout > (or stderr, I suppose) pipe has a small capacity and fills up quickly > (at least on Windows),
The pipe size is relatively small on Windows only because subprocess.Popen uses the default pipe size when it calls WinAPI CreatePipe(). The default size is 4 KiB, which actually should be big enough for most cases. If some other pipe size is passed, the value is "advisory", meaning that it has to be within the allowed range (but there's no practical limit on the size) and that it gets rounded up to an allocation boundary (e.g. a multiple of the system's virtual-memory page size). For example, here's a 256 MiB pipe: >>> hr, hw = _winapi.CreatePipe(None, 256*1024*1024) >>> _winapi.WriteFile(hw, b'a' * (256*1024*1024)) (268435456, 0) >>> data = _winapi.ReadFile(hr, 256*1024*1024)[0] >>> len(data) == 256*1024*1024 True > then it blocks until it is emptied by a read. > But if you aren't polling, you don't know there is something to read so > the pipe never gets emptied. And if you don't read it before the pipe > has filled up, you may lose data. If there's just one pipe, then there's no potential for deadlock, and no potential to lose data. If there's a timeout, however, then communicate() still has to use I/O polling or a thread to avoid blocking indefinitely in order to honor the timeout. Note that there's a bug in subprocess on Windows. Popen._communicate() should create a new thread for each pipe. However, it actually calls stdin.write() on the current thread, which could block and ignore the specified timeout. For example, in the following case the timeout of 5 seconds is ignored: >>> cmd = 'python -c "import time; time.sleep(20)"' >>> t0 = time.time(); p = subprocess.Popen(cmd, stdin=subprocess.PIPE) >>> r = p.communicate(b'a'*4097, timeout=5); t1 = time.time() - t0 >>> t1 20.2162926197052 There's a potential for deadlock when two or more pipes are accessed synchronously by two threads (e.g. one thread in each process). For example, reading from one of the pipes blocks one of the threads because the pipe is empty, while at the same time writing to the other pipe blocks the other thread because the pipe is full. However, there will be no deadlock if at least one of the threads always polls the pipes to ensure that they're ready (i.e. data is available to be read, or at least PIPE_BUF bytes can be written without blocking), which is how communicate() is implemented on POSIX. Alternatively, one of the processes can use a separate thread for each pipe, which is how communicate() is implemented on Windows. Note that there are problems with the naive implementation of the reader threads on Windows, in particular if a pipe handle leaks to descendants of the child process, which prevents the pipe from closing. A better implementation on Windows would use named pipes opened in asynchronous mode on the parent side and synchronous mode on the child side. Just implement a loop that handles I/O completion using events, APCs, or an I/O completion port. -- https://mail.python.org/mailman/listinfo/python-list