On Dec 21 01:30, Tom Honermann wrote: > I spent most of the week debugging this issue. This appears to be a > defect in Windows. I can reproduce the issue without Cygwin. I > can't rule out other third party kernel mode software possibly > contributing to the issue. A simple change to Cygwin works around > the problem for me. > > I don't know which Windows releases are affected by this. I've only > reproduced the problem (outside of Cygwin) with Wow64 processes > running on 64-bit Windows 7. I haven't yet tried elsewhere. > > The problem appears to be a race condition involving concurrent > calls to TerminateProcess() and ExitThread(). The example code > below minimally mimics the threads created and exit process/thread > calls that are performed when running Cygwin's false.exe. The > primary thread exits the process via TerminateProcess() ala > pinfo::exit() in winsup/cygwin/pinfo.cc. The secondary thread exits > itself via ExitThread() ala Cygwin's signal processing thread > function, wait_sig(), in winsup/cygwin/sigproc.cc. > > When the race condition results in the undesirable outcome, the exit > code for the process is set to the exit code for the secondary > thread's call to ExitThread(). I can only speculate at this point, > but my guess is that the TerminateProcess() code disassociates the > calling thread from the process before other threads are stopped > such that ExitThread(), concurrently running in another thread, may > determine that the calling thread is the last thread of the process > and overwrite the process exit code. > > The issue also reproduces if ExitProcess() is called in place of > TerminateProcess(). The test case below only uses > TerminateProcess() because that is what Cygwin does. > > Source code to reproduce the issue follows. Again, Cygwin is not > required to reproduce the problem. For my own testing, I compiled > the code using Microsoft's Visual Studio 2010 x86 compiler with the > command 'cl /Fetest-exit-code.exe test-exit-code.cpp' > > test-exit-code.cpp:
Wow. Thanks for this testcase. I tried to reproduce the issue and I was not able to reprodsuce it on a single-CPU, single-core setup, but I could reproduce it almost immediately on a dual-core system, twice in a row in under 5 secs. > The workaround I implemented within Cygwin was simple and sloppy. I > added a call to Sleep(1000) immediately before the call to > ExitThread() in wait_sig() in winsup/cygwin/sigproc.cc. Since this > thread (probably) doesn't exit until the process is exiting anyway, > the call to Sleep() does not adversely affect shutdown. The thread > just gets terminated while in the call to Sleep() instead of exiting > before the process is terminated or getting terminated while still > in the call to ExitThread(). A better solution might be to avoid > the thread exiting at all (so long as it can't get terminated while > holding critical resources), or to have the process exiting thread > wait on it. Neither of these is ideal. Orderly shutdown of > multi-threaded processes is really hard to do correctly on Windows. > > Since the exit code for the signal processing thread is not used, > having the wait_sig() thread (and any other threads that could > potentially concurrently exit with another thread) exit with a > special status value such as STATUS_THREAD_IS_TERMINATING > (0xC000004BL) would enable diagnosis of this issue as any process > exit code matching this would be a likely indicator that this issue > was encountered. Maybe the signal thread should really not exit by itself, but just wait until the TerminateThread is called. Chris? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple