On Thu, Aug 06, 2020 at 12:48:10AM +0200, Michał Mirosław wrote:
> On Thu, Aug 06, 2020 at 12:29:36AM +0300, Peter Pentchev wrote:
> > On Wed, Aug 05, 2020 at 10:52:31PM +0200, Michał Mirosław wrote:
> [...]
> > > Using print-debugging, I see that it stops at wait_for_child line just
> > > after printing the version. It seems that something is reaping the child
> > > before the main thread has a chance to wait for it.
> >
> > OK, so the only thing that comes to my mind now is that you may be
> > hitting a crazy, crazy race between register_child() and child_reaper(),
> > and I say "a crazy, crazy race", because the test has to (apparently
> > reproducibly) receive the CHLD signal exactly between the check and
> > the creation in register_child()'s first "$children{...} //= ...cv"
> > statement.
>
> Well, there is nothing that prevents SIGCHLD arriving between fork() and
> register_child(). You could test this with more confidence (though not
> 100%-reliably) by putting 'exit 1' just at the start of ($pid == 0) branch.
Nah, the problem is not just "between fork() and register_child()".
It really must arrive at a very specific moment in time, because
the //= operations for setting $children{$pid}{cv} try to make sure that
a new value is not set (that is, a new condition variable is not
created) if there already is such an element in the array. So the race
is indeed between the //= in register_child() and the //= in
child_reaper() - that is, child_reaper() must be invoked (SIGCHLD must
arrive) *during* the execution of the //= in register_child().
Unless I'm missing something, which is not at all out of the question :)
> > Can you apply the following patch and show me the output of running
> > the test?
>
> Sure, but I got no patch. :-)
Oof. Not my day, is it... Here it is... I hope.
G'luck,
Peter
--
Peter Pentchev [email protected] [email protected] [email protected]
PGP key: http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
commit 859acd0603a5bc74620df4949e1450805b7ba151 Author: Peter Pentchev <[email protected]> Date: Thu Aug 6 00:26:32 2020 +0300 Diagnostic output for the runtime test's child reaper. diff --git a/debian/tests/runtime b/debian/tests/runtime index f594d9a..81cef23 100755 --- a/debian/tests/runtime +++ b/debian/tests/runtime @@ -55,19 +55,25 @@ sub unregister_child_reaper() sub child_reaper() { + say 'RDBG child_reaper() invoked'; while (1) { my $pid = waitpid -1, WNOHANG; my $status = $?; + say "RDBG - pid $pid status $status"; if (!defined $pid) { die "Could not waitpid() in a SIGCHLD handler: $!\n"; } elsif ($pid == 0 || $pid == -1) { + say 'RDBG - done'; last; } else { + say 'RDBG - '.(exists $children{$pid} ? '' : 'not ').'found in the children hash'; $children{$pid}{cv} //= AnyEvent->condvar; + say 'RDBG - cv '.$children{$pid}{cv}.': '.($children{$pid}{cv}->ready ? '' : 'not ').'ready'; $children{$pid}{cv}->send($status); } } + say 'RDBG - out of the child_reaper() loop'; } sub register_child($ $) @@ -76,6 +82,7 @@ sub register_child($ $) # Weird, but we want it to be at least reasonably atomic-like $children{$pid}{cv} //= AnyEvent->condvar; + say "register_child: pid $pid cv ".$children{$pid}{cv}; my $ch = $children{$pid}; $ch->{pid} = $pid;
signature.asc
Description: PGP signature

