On Tue, Aug 30, 2022 at 7:44 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Buildfarm member mamba (NetBSD-current on prairiedog's former hardware) > has failed repeatedly since I set it up. I have now run the cause of > that to ground [1], and here's what's happening: if the postmaster > receives a signal just before it first waits at the select() in > ServerLoop, it can self-deadlock. During the postmaster's first use of > select(), the dynamic loader needs to resolve the PLT branch table entry > that the core executable uses to reach select() in libc.so, and it locks > the loader's internal data structures while doing that. If we enter > a signal handler while the lock is held, and the handler needs to do > anything that also requires the lock, the postmaster is frozen.
. o O ( pselect() wouldn't have this problem, but it's slightly too new for the back branches that didn't yet require SUSv3... drat ) > I'd originally intended to make this code "#ifdef __NetBSD__", > but on looking into the FreeBSD sources I find much the same locking > logic in their dynamic loader, and now I'm wondering if such behavior > isn't pretty standard. The added calls should have negligible cost, > so it doesn't seem unreasonable to do them everywhere. FWIW I suspect FreeBSD can't break like this in a program linked with libthr, because it has a scheme for deferring signals while the runtime linker holds locks. _rtld_bind calls _thr_rtld_rlock_acquire, which uses the THR_CRITICAL_ENTER mechanism to cause thr_sighandler to defer until release. For a non-thread program, I'm not entirely sure, but I don't think the fork() problem exists there. (Could be wrong, based on a quick look.) > (Of course, a much better answer is to get out of the business of > doing nontrivial stuff in signal handlers. But even if we get that > done soon, we'd surely not back-patch it.) +1