Hi, On 2022-08-29 15:43:55 -0400, Tom Lane wrote: > Buildfarm member mamba (NetBSD-current on prairiedog's former hardware) > has failed repeatedly since I set it up. I have now run the cause of > that to ground [1], and here's what's happening: if the postmaster > receives a signal just before it first waits at the select() in > ServerLoop, it can self-deadlock. During the postmaster's first use of > select(), the dynamic loader needs to resolve the PLT branch table entry > that the core executable uses to reach select() in libc.so, and it locks > the loader's internal data structures while doing that. If we enter > a signal handler while the lock is held, and the handler needs to do > anything that also requires the lock, the postmaster is frozen.
Ick. > The attached patch seems to fix the problem, by forcing resolution of > the PLT link before we unblock signals. It depends on the assumption > that another select() call appearing within postmaster.c will share > the same PLT link, which seems pretty safe. Hm, what stops the same problem from occuring with other functions? Perhaps it'd be saner to default to building with -Wl,-z,now? That should fix the problem too, right (and if we combine it with relro, it'd be a security improvement to boot). Greetings, Andres Freund