Overflow cygthreads (those which use simplestub) don't set notify_detached event which may cause timer_delete to hung
Hi, I think that this is a problem in cygthreads, but since I have been looking at cygwin for less than two weeks I might as well be quite mistaken. The problem can be reproduced with Cygwin 1.7.9-1 and also with today's checkout of the code. To repro run this program form the attachment (compiled using: g++ main.cc) One should observe the program hanging when deleting timer. On my computer it usually is timer 31, but depending on race conditions you might get a different one. If you don't get the problem then try increasing TIMERS. After spending long hours looking at cygwinthread.cc code I have come up with the following patch to fix the problem. I believe that the solution should be bullet-proof also if someone terminates the thread (thread_terminate()) or calls detach(), but since it was the first time I looked at the cygwin code I might as well be wrong. * cygthread.cc (cygthread::simplestub): Notify that the thread has detached also in freerange thread case. Any comments are most welcome, Best wishes, Rafal P.S. Please note that another (completely separate) problem with freerange threads leaking memory in auto_release case exists. I will create another post with info about that. Index: src/winsup/cygwin/cygthread.cc === RCS file: /cvs/src/src/winsup/cygwin/cygthread.cc,v retrieving revision 1.85 diff -u -p -r1.85 cygthread.cc --- src/winsup/cygwin/cygthread.cc 30 Jul 2011 20:50:23 - 1.85 +++ src/winsup/cygwin/cygthread.cc 25 Aug 2011 18:44:32 - @@ -136,7 +136,11 @@ cygthread::simplestub (VOID *arg) cygthread *info = (cygthread *) arg; _my_tls._ctinfo = info; info->stack_ptr = &arg; + HANDLE notify = info->notify_detached; info->callfunc (true); + if (notify) + SetEvent(notify); + return 0; } main.cc Description: Binary data
Re: Overflow cygthreads (those which use simplestub) don't set notify_detached event which may cause timer_delete to hung
On Thu, Aug 25, 2011 at 08:06:08PM +0100, Rafal Zwierz wrote: >* cygthread.cc (cygthread::simplestub): Notify that the thread has >detached also in freerange thread case. Looks good. I'll check this in. Thanks. cgf
Extend faq.using to discuss fork failures
Hi all, Based on the feedback on cygwin-dev, I've put together a revised pair of faq.using entries: one listing briefly the symptoms of fork failures and what to do about it, and the other giving some details about why fork fails (sometimes in spite of everything we do to compensate). * faq-using.xml (faq.using.fixing-fork-failures): Add. (faq.using.why-fork-fails): Add. Thoughts? Ryan Index: winsup/doc/faq-using.xml === RCS file: /cvs/src/src/winsup/doc/faq-using.xml,v retrieving revision 1.35 diff -u -r1.35 faq-using.xml --- winsup/doc/faq-using.xml4 Aug 2011 18:25:41 - 1.35 +++ winsup/doc/faq-using.xml26 Aug 2011 01:58:44 - @@ -1199,3 +1199,92 @@ + + Calls to fork fail a lot. How can + I fix the problem? + + + Unix-like applications make extensive use of + fork, a function which spawns an exact copy of + the running process. Notable fork-using applications include bash + (and bash scripts), emacs, gcc, make, perl, python, and + ruby. Unfortunately, the Windows ecosystem is quite hostile to a + reliable fork implementation, leading to error messages such as: + +unable to remap $dll to same address as parent +couldn't allocate heap +died waiting for dll loading +child -1 - died waiting for longjmp before initialization +STATUS_ACCESS_VIOLATION +resource temporarily unavailable + + If you find that frequent fork failures interfere with normal + use of cygwin, please try the following: + +Restart whatever process is trying (and failing) to use +fork. Sometimes Windows sets up a process +environment that is even more hostile to fork than usual. +Ensure that you have eliminated (not just disabled) all +software on the BLODA (see http://cygwin.com/faq/faq.using.html#faq.using.bloda"; +/>) +Install the 'rebase' package, read its README in +/usr/share/doc/Cygwin, and follow the +instructions there to run 'rebaseall'. + + Please note that installing new packages or updating existing + ones often undoes the effects of rebaseall and cause fork failures + to reappear. If so, just run rebaseall again. + + + + Why does fork fail so much, + anyway? (or: Why does fork still fail even though + I ran rebaseall?) + + The semantics of fork require that a forked + child process have exactly the same address + space layout as its parent. However, Windows provides no native + support for cloning address space between processes and several + features actively undermine a reliable fork + implementation. Three issues are especially prevalent: + +DLL base address collisions. Unlike *nix shared +libraries, which use "position-independent code", Windows shared +libraries assume a fixed base address. Whenever the hard-wired +address ranges of two DLLs collide (which occurs quite often), the +Windows loader must "rebase" one of them to a different +address. However, it does not resolve collisions consistently, and +may rebase a different dll and/or move it to a different address +every time. Cygwin can usually compensate for this effect when it +involves libraries opened dynamically, but collisions among +statically-linked dlls (dependencies known at compile time) are +resolved before cygwin1.dll initializes and +cannot be fixed afterward. This problem can only be solved by +removing the base address conflicts which cause the problem, +usually using the rebaseall package. + +Address space layout randomization (ASLR). Starting with +Vista, Windows implements ASLR, which means that thread stacks, +heap, memory-mapped files, and statically-linked dlls are placed +at different (random) locations in each process. This behavior +interferes with a proper fork, and if an +unmovable object (process heap or system dll) ends up at the wrong +location, Cygwin can do nothing to compensate (though it will +retry a few times automatically). In a 64-bit system, marking +executables as large address-ware and rebasing dlls to high +addresses has been reported to help, as ASLR affects only the +lower 2GB of address space. + +DLL injection by BLODA. Badly-behaved applications which +inject dlls into other processes often manage to clobber important +sections of the child's address space, leading to base address +collisions which rebasing cannot fix. The only way to resolve this +problem is to remove (usually uninstall) the offending +app. +In summary, current Windows implementations make it +impossible to implement a perfectly reliable fork, and occasional +fork failures are inevitable. PTC. + + +
Extend faq.using to discuss fork failures
Hi all, Based on the feedback from cygwin-dev, I've put together a revised pair of faq.using entries: one listing briefly the symptoms of fork failures and what to do about it, and the other giving some details about why fork fails (sometimes in spite of everything we do to compensate). * faq-using.xml (faq.using.fixing-fork-failures): Add. (faq.using.why-fork-fails): Add. Thoughts? Ryan Index: winsup/doc/faq-using.xml === RCS file: /cvs/src/src/winsup/doc/faq-using.xml,v retrieving revision 1.35 diff -u -r1.35 faq-using.xml --- winsup/doc/faq-using.xml4 Aug 2011 18:25:41 - 1.35 +++ winsup/doc/faq-using.xml26 Aug 2011 01:58:44 - @@ -1199,3 +1199,92 @@ + + Calls to fork fail a lot. How can + I fix the problem? + + + Unix-like applications make extensive use of + fork, a function which spawns an exact copy of + the running process. Notable fork-using applications include bash + (and bash scripts), emacs, gcc, make, perl, python, and + ruby. Unfortunately, the Windows ecosystem is quite hostile to a + reliable fork implementation, leading to error messages such as: + +unable to remap $dll to same address as parent +couldn't allocate heap +died waiting for dll loading +child -1 - died waiting for longjmp before initialization +STATUS_ACCESS_VIOLATION +resource temporarily unavailable + + If you find that frequent fork failures interfere with normal + use of cygwin, please try the following: + +Restart whatever process is trying (and failing) to use +fork. Sometimes Windows sets up a process +environment that is even more hostile to fork than usual. +Ensure that you have eliminated (not just disabled) all +software on the BLODA (see http://cygwin.com/faq/faq.using.html#faq.using.bloda"; +/>) +Install the 'rebase' package, read its README in +/usr/share/doc/Cygwin, and follow the +instructions there to run 'rebaseall'. + + Please note that installing new packages or updating existing + ones often undoes the effects of rebaseall and cause fork failures + to reappear. If so, just run rebaseall again. + + + + Why does fork fail so much, + anyway? (or: Why does fork still fail even though + I ran rebaseall?) + + The semantics of fork require that a forked + child process have exactly the same address + space layout as its parent. However, Windows provides no native + support for cloning address space between processes and several + features actively undermine a reliable fork + implementation. Three issues are especially prevalent: + +DLL base address collisions. Unlike *nix shared +libraries, which use "position-independent code", Windows shared +libraries assume a fixed base address. Whenever the hard-wired +address ranges of two DLLs collide (which occurs quite often), the +Windows loader must "rebase" one of them to a different +address. However, it does not resolve collisions consistently, and +may rebase a different dll and/or move it to a different address +every time. Cygwin can usually compensate for this effect when it +involves libraries opened dynamically, but collisions among +statically-linked dlls (dependencies known at compile time) are +resolved before cygwin1.dll initializes and +cannot be fixed afterward. This problem can only be solved by +removing the base address conflicts which cause the problem, +usually using the rebaseall package. + +Address space layout randomization (ASLR). Starting with +Vista, Windows implements ASLR, which means that thread stacks, +heap, memory-mapped files, and statically-linked dlls are placed +at different (random) locations in each process. This behavior +interferes with a proper fork, and if an +unmovable object (process heap or system dll) ends up at the wrong +location, Cygwin can do nothing to compensate (though it will +retry a few times automatically). In a 64-bit system, marking +executables as large address-ware and rebasing dlls to high +addresses has been reported to help, as ASLR affects only the +lower 2GB of address space. + +DLL injection by BLODA. Badly-behaved applications which +inject dlls into other processes often manage to clobber important +sections of the child's address space, leading to base address +collisions which rebasing cannot fix. The only way to resolve this +problem is to remove (usually uninstall) the offending +app. +In summary, current Windows implementations make it +impossible to implement a perfectly reliable fork, and occasional +fork failures are inevitable. PTC. + + +
Re: Extend faq.using to discuss fork failures
Ooops. Mailer hiccup. Please ignore this one. On 25/08/2011 10:08 PM, Ryan Johnson wrote: Hi all, Based on the feedback on cygwin-dev, I've put together a revised pair of faq.using entries: one listing briefly the symptoms of fork failures and what to do about it, and the other giving some details about why fork fails (sometimes in spite of everything we do to compensate). * faq-using.xml (faq.using.fixing-fork-failures): Add. (faq.using.why-fork-fails): Add. Thoughts? Ryan