On Tue, Feb 21, 2017 at 12:58 PM Erik Bray wrote: > > On Mon, Feb 20, 2017 at 11:54 PM, Mark Geisert wrote: > > Erik Bray wrote: > >> > >> On Mon, Feb 20, 2017 at 11:54 AM, Mark Geisert wrote: > >>>> > >>>> So my guess was that Cygwin might try to hold on to a handle to a > >>>> child process at least until it's been explicitly wait()ed. But that > >>>> does not seem to be the case after all. > >>> > >>> > >>> > >>> You might have missed a subtlety in what I said above. The Python > >>> interpreter itself is calling wait4() to reap your child process. Cygwin > >>> has told Python one of its children has died. You won't get the chance > >>> to > >>> wait() for it yourself. Cygwin *does* have a handle to the process, but > >>> it > >>> gets closed as part of Python calling wait4(). > >> > >> > >> To be clear, wait4() is not called from Python until the script > >> explicitly calls p.wait(). > >> In other words, when run this step by step (e.g. in gdb) I don't see a > >> wait4() call until the point where the script explicitly waits(). I > >> don't see any reason Python would do this behind the scenes. > > > > > > You're right. I missed the wait in your script and ASSumed too much of the > > Python interpreter :-( . > > > > > >>>> Anyways, I think it would be nicer if /proc returned at least partial > >>>> information on zombie processes, rather than an error. I have a patch > >>>> to this effect for /proc/<pid>/stat, and will add a few more as well. > >>>> To me /proc/<pid>/stat was the most important because that's the > >>>> easiest way to check the process's state in the first place! Now I > >>>> also have to catch EINVAL as well and assume that means a zombie > >>>> process. > >>> > >>> > >>> > >>> The file /proc/<pid>/stat is there until Cygwin finishes cleanup of the > >>> child due to Python having wait()ed for it. When you run your test > >>> script, > >>> pay attention to the process state character in those cases where you > >>> successfully read the stat file. It's often S (stopped, I think) or R > >>> (running) but I also see Z (zombie) sometimes. Your script is in a race > >>> with Cygwin, and you cannot guarantee you'll see a killed process's state > >>> before Cygwin cleans it up. > >>> > >>> One way around this *might* be to install a SIGCHLD handler in your > >>> Python > >>> script. If that's possible, that should tell you when your child exits. > >> > >> > >> Perhaps the Python script is a red herring. I just wrote it to > >> demonstrate the problem. The difference between where I send stdout > >> to is strange, but you're likely right that it just comes down to > >> subtle timing differences. Here's a C program that demonstrates the > >> same issue more reliably. Interestingly, it works when I run it in > >> strace (probably just because of the strace overhead) but not when I > >> run it normally. > >> > >> My point in all this is I'm confused why Cygwin would give up its > >> handles to the Windows process before wait() has been called. > >> > >> (In fact, it's pretty confusing to have fopen returning EINVAL which > >> according to [1] it should only be doing if the mode string were > >> invalid.) > >> > >> Thanks, > >> Erik > >> > >> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html > > > > > > O.K., you may be on to something amiss in the Cygwin DLL. Thanks for the > > STC in C; that'll help somebody looking further at this. I'm out of ideas. > > It might be possible to reduce strace overhead somewhat by selecting a > > smaller set of trace options than the default. > > Note: My previous test program had a bug in do_child() (not correctly > terminating the argv array). The attached program fixes the bug. > I've also attached a (truncated) strace log from it.
With apologies for re-raising a 2 year old thread; I've finally been back to working on my port of psutil [1]. I was getting some confusing errors reading the /proc/[pid]/stat files of recently created processes that had quickly become zombified. I had completely forgotten about this issue until I saw that trying to read the stat file was resulting in EINVAL ("invalid argument") and something about that ringed a bell. So, I can confirm that this is still an issue. Apparently I wrote that I had a patch to Cygwin for this. I have no idea where that patch is but I'll look for it, or try to reproduce it. I think the idea for the patch was to at least make a zombie process's stat file readable so that the status flag ("Z") can be read, and maybe fill the remaining fields with 0. Once I find and/or reproduce that patch I'll submit it to cygwin-patches. [1] https://psutil.readthedocs.io/en/latest/ -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple