On Fri, Dec 2, 2022 at 2:40 PM Andres Freund <and...@anarazel.de> wrote: > On 2022-12-02 10:12:25 +1300, Thomas Munro wrote: > > with a latch as the wakeup mechanism for "PM signals" (requests from > > backends to do things like start a background worker, etc). > > Hm - is that directly related? ISTM that using a WES in the main loop, and > changing pmsignal.c to a latch are somewhat separate things?
Yeah, that's a good question. This comes from a larger patch set where my *goal* was to use latches everywhere possible for interprocess wakeups, but it does indeed make a lot of sense to do the postmaster WaitEventSet retrofit completely independently of that, and leaving the associated robustness problems for later proposals (the posted patch clearly fails to solve them). > I don't think b) is the case as the patch stands. Imagine some process > overwriting pm_latch->owner_pid. That'd then break the SetLatch() in > postmaster's signal handler, because it wouldn't realize that itself needs to > be woken up, and we'd just signal some random process. Right. At some point I had an idea about a non-shared table of latches where OS-specific things like pids and HANDLEs live, so only the maybe_waiting and is_set flags are in shared memory, and even those are ignored when accessing the latch in 'robust' mode (they're only optimisations after all). I didn't try it though. First you might have to switch to a model with a finite set of latches identified by index, or something like that. But I like your idea of separating that whole problem. > It doesn't seem trivial (but not impossible either) to make SetLatch() robust > against arbitrary corruption. So it seems easier to me to just put the latch > in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler. Alright, good idea, I'll do a v2 like that.