On Fri, Jan 26, 2018 at 6:40 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Thanks for looking into this. Yeah. I think you're right that it > could add a bit of overhead in some cases (ie if you receive a lot of > signals that AREN'T caused by fork failure, then you'll enter > HandleParallelMessage() every time unnecessarily), and it does feel a > bit kludgy. The best idea I have to fix that so far is like this: (1) > add a member fork_failure_count to struct BackgroundWorkerArray, (2) > in do_start_bgworker() whenever fork fails, do > ++BackgroundWorkerData->fork_failure_count (ie before a signal is sent > to the leader), (3) in procsignal_sigusr1_handler where we normally do > a bunch of CheckProcSignal(PROCSIG_XXX) stuff, if > (BackgroundWorkerData->fork_failure_count != > last_observed_fork_failure_count) HandleParallelMessageInterrupt(). > As far as I know, as long as fork_failure_count is (say) int32 (ie not > prone to tearing) then no locking is required due to the barriers > implicit in the syscalls involved there. This is still slightly more > pessimistic than it needs to be (the failed fork may be for someone > else's ParallelContext), but only in rare cases so it would be > practically as good as precise PROCSIG delivery. It's just that we > aren't allowed to deliver PROCSIGs from the postmaster. We are > allowed to communicate through BackgroundWorkerData, and there is a > precedent for cluster-visible event counters in there already.
I could sign on to that plan, but I don't think we should hold this patch up for it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company