Andy Wingo <wi...@pobox.com> writes: >> I have found that what actually hangs after a fork are the mutexes >> supporting the threads: they are kernel-level resources, referenced by >> ID, and end up being shared between parent and child. > > Which ones, precisely?
GDB shows the hanging child process waiting on a mutex down in the BDW-GC garbage collection library. The parent has the mutex, and is blocked waiting to read data from the child. >> I don’t think there’s any safe way to restore the finalizer thread and >> support SCSH-style (begin ...) process forms. Shutting down the >> finalizer thread is the best we can do. > > The finalizer thread should be restored as needed, the next time GC > calls notify_finalizers_to_run. If you have two, initially identical Guile processes running, each of which depend on the same external resource needing finalization, how do you control cleanup in a way that’s safe for both processes? If either the parent or the child finalizes, the other process may still be depending on the external resource. The correct policy seems to be “Don’t use finalizers with primitive-fork” in the same way that we say “Don’t use threads with primitive-fork”. It’s a problem with (begin ...) process forms, not with situations where the process execs another program. > I think also that if you are most interested in a system in which > primitive-fork plays a large role, then probably you want a Guile > without threads (including the GC mark threads). Threads + fork is not > a recipe for success :) Understood. However, saying “primitive-fork requires a separate, threadless Guile installation” would be very limiting. I’d like to be in a position where Guile will manage any threads that it creates under the hood; then it’s up to the program owner to guarantee that there are no other threads running at fork time. Right now the under-the-hood threads appear to be: 1. The finalizer thread. 2. The signal delivery thread. (If you can think of any others, let me know.) For SCSH-style process forms, this comes up when trying to implement SCSH’s “early” child process auto-reap mechanism (section 3.4.1 in the SCSH manual). The “early” policy works by setting up a handler for SIGCHLD. We are guaranteed to have the signal delivery thread in that situation. (The “late” policy hooks into the garbage collector, so it avoids this problem; it can have pathological edge cases for long-running programs, however.) Derek -- Derek Upham s...@blarg.net