Hello Mark, Thanks for chiming in!
Mark H Weaver <m...@netris.org> skribis: > Does libgc spawn threads that run concurrently with user threads? If > so, that would be news to me. My understanding was that incremental > marking occurs within GC allocation calls, and marking threads are only > spawned after all user threads have been stopped, but I could be wrong. libgc launches mark threads as soon as it is initialized, I think. > The first idea that comes to my mind is that perhaps the finalization > thread is holding the GC allocation lock when 'fork' is called. The > finalization thread grabs the GC allocation lock every time it calls > 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors > (including pipes) register finalizers and therefore spawn the > finalization thread and make work for it to do. In 2.2 there’s scm_i_finalizer_pre_fork that takes care of shutting down the finalization thread right before fork. So the finalization thread cannot be blamed, AIUI. > Another possibility: both the finalization thread and the signal > delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', > which also temporarily grabs the GC allocation lock before calling the > specified function. See 'GC_do_blocking_inner' in pthread_support.c in > libgc. You spawn the signal delivery thread by calling 'sigaction' and > you make work for it to do every second when the SIGALRM is delivered. That’s definitely a possibility: the signal thread could be allocating stuff, and thereby taking the alloc lock just at that time. >> If that is correct, the fix would be to call fork within >> ‘GC_call_with_alloc_lock’. >> >> How does that sound? > > Sure, sounds good to me. Here’s a patch:
diff --git a/libguile/posix.c b/libguile/posix.c index b0fcad5fd..088e75631 100644 --- a/libguile/posix.c +++ b/libguile/posix.c @@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1, #undef FUNC_NAME #ifdef HAVE_FORK +static void * +do_fork (void *pidp) +{ + * (int *) pidp = fork (); + return NULL; +} + SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0, (), "Creates a new \"child\" process by duplicating the current \"parent\" process.\n" @@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0, " further behavior unspecified. See \"Processes\" in the\n" " manual, for more information.\n"), scm_current_warning_port ()); - pid = fork (); + + /* Take the alloc lock to make sure it is released when the child + process starts. Failing to do that the child process could start + in a state where the alloc lock is taken and will never be + released. */ + GC_call_with_alloc_lock (do_fork, &pid); + if (pid == -1) SCM_SYSERROR; return scm_from_int (pid);
Thoughts? Unfortunately my ‘call-with-decompressed-port’ reproducer doesn’t seem t to reproduce much today so I can’t tell if this helps (I let it run more than 5 minutes with the supposedly-buggy Guile and nothing happened…). Thanks, Ludo’.