On Fri, Nov 18, 2022 at 7:42 AM Joel Knight <knight.j...@gmail.com> wrote: > > Hi. > > I'm looking for guidance on how to troubleshoot a piece of software > which is spinning after calling fork(2).
Hi. I've been digging into this more and think I've found a bug in the threading code. Consider: - Process (A) forks a child (B) - (B) creates and reaps one or more threads (pthread_create/pthread_join) - (B) forks child (C) - (C) creates and reaps one or more threads (pthread_create/pthread) - (C) tries to fork child (D) With this sequence of events, (C) doesn't return from fork(); it's stuck spinning on the `atfork` lock. 1. Because (B) created a thread, its `__isthreaded` flag is set high so libc's fork() grabs the atfork lock when (B) is forking (C). 2. When (C) is created, it inherits the state of the atfork lock from its parent. 3. When (C) comes to life inside libc's fork(), libc makes a call to release the atfork lock, but (C)'s `__isthreaded` has been set low by _dofork(), therefore releasing of the lock is short-circuited inside _ATFORK_UNLOCK(). 4. When (C) re-enters fork() because it's trying to create (D), libc tries to acquire the atfork lock and spins because (C) is already holding it. I don't know the threading or libc parts of the OS well enough to know what an appropriate fix is here. Ideas which come to mind are unconditionally releasing the lock inside _ATFORK_UNLOCK() or forcing the lock to be unlocked, possibly inside _dofork(), when the new process is created (there are 3 other locks which get this behavior in there already). I don't know what the implications are for either of these ideas. Hoping someone who knows this part of the OS can offer some thoughts. .joel