On Fri, Nov 18, 2022 at 7:42 AM Joel Knight <knight.j...@gmail.com> wrote:
>
> Hi.
>
> I'm looking for guidance on how to troubleshoot a piece of software
> which is spinning after calling fork(2).

Hi. I've been digging into this more and think I've found a bug in the
threading code.

Consider:
- Process (A) forks a child (B)
- (B) creates and reaps one or more threads (pthread_create/pthread_join)
- (B) forks child (C)
- (C) creates and reaps one or more threads (pthread_create/pthread)
- (C) tries to fork child (D)

With this sequence of events, (C) doesn't return from fork(); it's
stuck spinning on the `atfork` lock.

1. Because (B) created a thread, its `__isthreaded` flag is set high
so libc's fork() grabs the atfork lock when (B) is forking (C).
2. When (C) is created, it inherits the state of the atfork lock from
its parent.
3. When (C) comes to life inside libc's fork(), libc makes a call to
release the atfork lock, but (C)'s `__isthreaded` has been set low by
_dofork(), therefore releasing of the lock is short-circuited inside
_ATFORK_UNLOCK().
4. When (C) re-enters fork() because it's trying to create (D), libc
tries to acquire the atfork lock and spins because (C) is already
holding it.

I don't know the threading or libc parts of the OS well enough to know
what an appropriate fix is here. Ideas which come to mind are
unconditionally releasing the lock inside _ATFORK_UNLOCK() or forcing
the lock to be unlocked, possibly inside _dofork(), when the new
process is created (there are 3 other locks which get this behavior in
there already). I don't know what the implications are for either of
these ideas.

Hoping someone who knows this part of the OS can offer some thoughts.



.joel

Reply via email to