Great catch!

However, since the process status is never used (NULL provided to waitpid),
why not just set the signal handler for SIGCHLD to SIG_IGN?

This would significantly reduce the number of waitpid invocations.

Nadav

On Sat, May 31, 2025 at 11:53:05AM +0800, Valentin Lab wrote:
> This patch fixes a long-standing zombie-process leak in crond that
> appears whenever crond itself is PID 1 (typical in minimal BusyBox
> containers).
> 
> Reproducer
> ==========
> 
>    mkdir /tmp/test-root-crontabs -p
>    echo "* * * * * sh -c 'sleep 1 &'" > "/tmp/test-root-crontabs/root"
>    sudo chown root:root -R /tmp/test-root-crontabs
> 
>    ## Run busybox crond PID 1
>    docker run -d --rm --name busybox \
>           -v /tmp/test-root-crontabs/root:/var/spool/cron/crontabs/root:ro \
>           busybox:1.37.0-uclibc \
>           crond -f -d 0 > /dev/null
> 
>    watch -n 1 '
>      echo "Expecting more zombies in $((60 - 10#$(date +%S)))s (Ctrl-C to 
> quit):"
>      ps -ax -o pid,ppid,stat,comm | egrep "\s+Z\s+"
>    '
>    echo "Cleaning busybox container"
>    docker stop busybox
> 
> 
> You should see one new "sleep" zombie each minute.
> 
> If crond is *not* PID 1 (e.g. started via a wrapper shell), the leak
> does not appear because the wrapper—now PID 1—reaps the orphans.
> 
> 
> Root cause
> ==========
> 
> crond only calls `waitpid()` for PIDs it tracks.  Background tasks
> (`sleep 5 &`) become orphans; the kernel re-parents them to PID 1.
> When crond *is* PID 1, it inherits these orphans but never waits for
> them, so they persist as zombies.
> 
> 
> Fix
> ===
> 
> Add a tiny reaper loop:
> 
>      while (waitpid(-1, NULL, WNOHANG) > 0);
> 
> Placed at the end of `check_completions()`, it collects any remaining
> children.  On systems where crond is **not** PID 1 the loop returns
> `-ECHILD` immediately, so behaviour and overhead are unchanged.
> 
> 
> Testing
> =======
> 
> * Reproduced the leak on `busybox:1.37.0-uclibc` and current git master.
> * Applied the patch, rebuilt BusyBox statically (with only crond),
>   bind-mounted the binary into a fresh container; zombie count stays at
>   0 after X min.
> 
>   Suggested docker commmand for testing with compiled (static) binary in CWD:
> 
>     docker run --rm --name busybox \
>         -v /tmp/test-root-crontabs/root:/var/spool/cron/crontabs/root:ro \
>         -v $PWD/busybox:/bin/crond \
>         busybox:1.37.0-uclibc \
>         crond -f -d 0
> 
> * Verified crond still exits cleanly and runs scheduled jobs normally.
> 
> Binary size impact (gcc 13.3.0, x86-64, `-Os`, static): +17 bytes.
> 
> Thanks for your time and for maintaining BusyBox!
> 
> Regards,
> Valentin Lab

> From 6bd536d80448485ebb5c12cb032286e97ddacb2a Mon Sep 17 00:00:00 2001
> From: Valentin Lab <[email protected]>
> Date: Sat, 31 May 2025 09:56:09 +0800
> Subject: [PATCH] crond: reap orphaned grandchildren to prevent zombie buildup
> 
> If a cron job launches a background task, e.g. `sh -c "sleep 5 &"`,
> the shell exits immediately and the `sleep` process is re-parented to
> PID 1.  When BusyBox `crond` itself happens to be PID 1 (common in a
> minimal container), those orphans become direct children of `crond`.
> Because `crond` only calls waitpid() for the PIDs it explicitly tracks,
> these processes remain forever in Z state and the container slowly
> fills with zombies.
> 
> Add a small `while (waitpid(-1, NULL, WNOHANG) > 0)` sweep at the end
> of check_completions() so any stray children are reaped.  When `crond`
> is not PID 1 the loop returns -ECHILD immediately, so behaviour and
> overhead on a normal system are unchanged.
> 
> Size impact: +12 bytes on x86-64 (gcc 13.3.0 -Os, static)
> 
> Signed-off-by: Valentin Lab <[email protected]>
> ---
>  miscutils/crond.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/miscutils/crond.c b/miscutils/crond.c
> index b3762d327..43d83b474 100644
> --- a/miscutils/crond.c
> +++ b/miscutils/crond.c
> @@ -1001,6 +1001,10 @@ static int check_completions(void)
>                       /* else: r == 0: "process is still running" */
>                       file->cf_has_running = 1;
>               }
> +
> +             /* Reap any other children we don't actively track */
> +             while (waitpid(-1, NULL, WNOHANG) > 0);
> +
>  //FIXME: if !file->cf_has_running && file->deleted: delete it!
>  //otherwise deleted entries will stay forever, right?
>               num_still_running += file->cf_has_running;
> -- 
> 2.43.0
> 

> _______________________________________________
> busybox mailing list
> [email protected]
> https://lists.busybox.net/mailman/listinfo/busybox

_______________________________________________
busybox mailing list
[email protected]
https://lists.busybox.net/mailman/listinfo/busybox

Reply via email to