Andres Freund <and...@anarazel.de> writes: > On 2017-04-20 00:50:13 -0400, Tom Lane wrote: >> My first reaction was that that sounded like a lot more work than removing >> two lines from maybe_start_bgworker and adjusting some comments. But on >> closer inspection, the slow-bgworker-start issue isn't the only problem >> here.
> FWIW, I vaguely remember somewhat related issues on x86/linux too. After sleeping and thinking more, I've realized that the slow-bgworker-start issue actually exists on *every* platform, it's just harder to hit when select() is interruptable. But consider the case where multiple bgworker-start requests arrive while ServerLoop is actively executing (perhaps because a connection request just came in). The postmaster has signals blocked, so nothing happens for the moment. When we go around the loop and reach PG_SETMASK(&UnBlockSig); the pending SIGUSR1 is delivered, and sigusr1_handler reads all the bgworker start requests, and services just one of them. Then control returns and proceeds to selres = select(nSockets, &rmask, NULL, NULL, &timeout); But now there's no interrupt pending. So the remaining start requests do not get serviced until (a) some other postmaster interrupt arrives, or (b) the one-minute timeout elapses. They could be waiting awhile. Bottom line is that any request for more than one bgworker at a time faces a non-negligible risk of suffering serious latency. I'm coming back to the idea that at least in the back branches, the thing to do is allow maybe_start_bgworker to start multiple workers. Is there any actual evidence for the claim that that might have bad side effects? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers