Hi all, I encountered a strange bug concerning zombie child processes during my testing. Specifically, it came up in the context of trying to read the /proc/<pid> entries of processes that should be zombified but not reaped yet (i.e. they have received and processed a SIGKILL or SIGTERM but have not been wait()ed on yet).
In some cases it's still possible to read, for example, /proc/<pid>/stat of the zombie process, while in other cases it fails with errno 22 (EINVAL). In the latter case, this is coming from the OpenProcess() call in format_process_stat() (or similarly in other format_process_* methods) indicating that Windows has already removed the process object from its process table. Or equivalently, there are no more open handles to the child process. What I don't understand is if this is intentional or not. I feel like Cygwin should try to keep the Windows process object alive as long as it's a zombie process. But in some cases it does and in some cases it doesn't. The attached Python script shows two such cases, and I don't understand quite where they differ. In one case, stdout from the child process is being sent to the parent over a pipe. In the second case stdout from the child is sent to /dev/null. In the first case the process object is kept alive and I can read its /proc entries. In the latter case it dies even before wait(). I'm not sure what the difference is in terms of keeping the process object alive. If Cygwin can't guarantee that the Windows process object is kept around while the process is in zombie state, it would be nice, as an alternative, to change the error handling in the format_process_* methods to return as much info as it can, with other fields zero'd out, rather than an error. Best, Erik -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple