Great catch! However, since the process status is never used (NULL provided to waitpid), why not just set the signal handler for SIGCHLD to SIG_IGN?
This would significantly reduce the number of waitpid invocations. Nadav On Sat, May 31, 2025 at 11:53:05AM +0800, Valentin Lab wrote: > This patch fixes a long-standing zombie-process leak in crond that > appears whenever crond itself is PID 1 (typical in minimal BusyBox > containers). > > Reproducer > ========== > > mkdir /tmp/test-root-crontabs -p > echo "* * * * * sh -c 'sleep 1 &'" > "/tmp/test-root-crontabs/root" > sudo chown root:root -R /tmp/test-root-crontabs > > ## Run busybox crond PID 1 > docker run -d --rm --name busybox \ > -v /tmp/test-root-crontabs/root:/var/spool/cron/crontabs/root:ro \ > busybox:1.37.0-uclibc \ > crond -f -d 0 > /dev/null > > watch -n 1 ' > echo "Expecting more zombies in $((60 - 10#$(date +%S)))s (Ctrl-C to > quit):" > ps -ax -o pid,ppid,stat,comm | egrep "\s+Z\s+" > ' > echo "Cleaning busybox container" > docker stop busybox > > > You should see one new "sleep" zombie each minute. > > If crond is *not* PID 1 (e.g. started via a wrapper shell), the leak > does not appear because the wrapper—now PID 1—reaps the orphans. > > > Root cause > ========== > > crond only calls `waitpid()` for PIDs it tracks. Background tasks > (`sleep 5 &`) become orphans; the kernel re-parents them to PID 1. > When crond *is* PID 1, it inherits these orphans but never waits for > them, so they persist as zombies. > > > Fix > === > > Add a tiny reaper loop: > > while (waitpid(-1, NULL, WNOHANG) > 0); > > Placed at the end of `check_completions()`, it collects any remaining > children. On systems where crond is **not** PID 1 the loop returns > `-ECHILD` immediately, so behaviour and overhead are unchanged. > > > Testing > ======= > > * Reproduced the leak on `busybox:1.37.0-uclibc` and current git master. > * Applied the patch, rebuilt BusyBox statically (with only crond), > bind-mounted the binary into a fresh container; zombie count stays at > 0 after X min. > > Suggested docker commmand for testing with compiled (static) binary in CWD: > > docker run --rm --name busybox \ > -v /tmp/test-root-crontabs/root:/var/spool/cron/crontabs/root:ro \ > -v $PWD/busybox:/bin/crond \ > busybox:1.37.0-uclibc \ > crond -f -d 0 > > * Verified crond still exits cleanly and runs scheduled jobs normally. > > Binary size impact (gcc 13.3.0, x86-64, `-Os`, static): +17 bytes. > > Thanks for your time and for maintaining BusyBox! > > Regards, > Valentin Lab > From 6bd536d80448485ebb5c12cb032286e97ddacb2a Mon Sep 17 00:00:00 2001 > From: Valentin Lab <[email protected]> > Date: Sat, 31 May 2025 09:56:09 +0800 > Subject: [PATCH] crond: reap orphaned grandchildren to prevent zombie buildup > > If a cron job launches a background task, e.g. `sh -c "sleep 5 &"`, > the shell exits immediately and the `sleep` process is re-parented to > PID 1. When BusyBox `crond` itself happens to be PID 1 (common in a > minimal container), those orphans become direct children of `crond`. > Because `crond` only calls waitpid() for the PIDs it explicitly tracks, > these processes remain forever in Z state and the container slowly > fills with zombies. > > Add a small `while (waitpid(-1, NULL, WNOHANG) > 0)` sweep at the end > of check_completions() so any stray children are reaped. When `crond` > is not PID 1 the loop returns -ECHILD immediately, so behaviour and > overhead on a normal system are unchanged. > > Size impact: +12 bytes on x86-64 (gcc 13.3.0 -Os, static) > > Signed-off-by: Valentin Lab <[email protected]> > --- > miscutils/crond.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/miscutils/crond.c b/miscutils/crond.c > index b3762d327..43d83b474 100644 > --- a/miscutils/crond.c > +++ b/miscutils/crond.c > @@ -1001,6 +1001,10 @@ static int check_completions(void) > /* else: r == 0: "process is still running" */ > file->cf_has_running = 1; > } > + > + /* Reap any other children we don't actively track */ > + while (waitpid(-1, NULL, WNOHANG) > 0); > + > //FIXME: if !file->cf_has_running && file->deleted: delete it! > //otherwise deleted entries will stay forever, right? > num_still_running += file->cf_has_running; > -- > 2.43.0 > > _______________________________________________ > busybox mailing list > [email protected] > https://lists.busybox.net/mailman/listinfo/busybox _______________________________________________ busybox mailing list [email protected] https://lists.busybox.net/mailman/listinfo/busybox
