On Wed, Jan 24, 2018 at 5:25 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > If there were some way for the postmaster to cause reason > PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of > just notification via kill(SIGUSR1) when it fails to fork a parallel > worker, we'd get (1) for free in any latch/CFI loop code. But I > understand that we can't do that by project edict.
Based on the above observation, here is a terrible idea you'll all hate. It is pessimistic and expensive: it thinks that every latch wake might be the postmaster telling us it's failed to fork() a parallel worker, until we've seen a sign of life on every worker's error queue. Untested illustration code only. This is the only way I've come up with to discover fork failure in any latch/CFI loop (ie without requiring client code to explicitly try to read either error or tuple queues). -- Thomas Munro http://www.enterprisedb.com
fork-failure-detection-idea.patch
Description: Binary data