Hi, On 2021-05-05 18:34:36 +0200, Magnus Hagander wrote: > Is this really a problem we should fix ourselves? Most daemon-managers > today will happily be configured to automatically restart a daemon on > failure with a single setting since a long time now. E.g. in systemd > (which most linuxen uses now) you just set Restart=on-failure (or > maybe even Restart=always) and something like RestartSec=10.
I'm not convinced by this. For two main reasons: 1) Our own code can know a lot more about the different error types than we can signal to systemd. The retry timeouts for e.g. a connection failure (whatever) is different than for fsync failing (alarm alarm). If we run out of space we might want to clean up space / invoke a command to do so, but there's nothing equivalent for systemd. 2) Do we really want to either implement at least 3 different ways to do this kind of thing, or force users to do it over and over again? That's not to say that there's no space for handling "unexpected" errors outside of postgres binaries, but I think it's pretty obvious that that doesn't cover somewhat predictable types of errors. And looking at the server side of things - it is *not* the same for systemd to restart postgres, as postmaster doing so internally. The latter can hold on onto shared memory. Which e.g. with simple huge_pages configurations is crucial, because it prevents other processes to use that shared memory. And it accelerates restart by a lot - the kernel needing to zero shared memory on first access (or allocation) can be a very significant penalty. Greetings, Andres Freund