bug#47253: network-manager shepherd services does not wait to be online

raid5atemyhomework via Bug reports for GNU Guix Sat, 20 Mar 2021 03:16:11 -0700

Hello MArk,

> [] I'll note, however, that merely waiting up to 30 seconds (orwhatever 
> timeout you choose) is not, in itself, a robust solution. What
> happens if the network is down for more than 30 seconds? What if it
> goes down after 'nm-online' checks, but before the dependent service has
> finished starting?


The sysad has to go look at what is wrong and fix it, then restart services 
manually as needed.  Presumably the sysad is competent enough to care for the 
hardware so this doesn't occur (too often).

What this avoids is if everything in the hardware setup (cables, NIC, router, 
hub, router config, etc.) is 100% fine but a reboot of the system for any 
reason causes services starting at boot to fail to start properly.  Competent 
sysads will put alarm bells if an important daemon is not running.  But if such 
alarm bells keep getting set off during a server restart, it gets annoying and 
makes the sysad pay less attention to alarm bells that *are* important enough 
for them to check the hardware setup.

So the common 30-second timeout used in SystemD is a fairly good compromise 
anyway.  Probably your alarm bells checks things hourly or so, and exiting 
after 30 seconds allows other services (e.g. a direct X server on the server, 
perhaps?) to start as well so a sysad can sit at the console and work the issue 
directly.  It's not perfect, but it's good enough for most things.

> Also, if a service fails to handle lack of network
> when it starts, it makes me wonder whether it properly handles a
> prolonged network failure while its running. It seems to me that the
> only fully satisfactory solution is for each service to robustly handle
> network failures at any time, although I acknowledge that workarounds
> are needed in the meantime.

Indeed, and the Guix substituter for example is fairly brittle against internet 
connectivity problems, not just at the local networking level, but from issues 
from the local network connection all the way to ci.guix.gnu.org.

Thanks
raid5atemyhomework

bug#47253: network-manager shepherd services does not wait to be online

Reply via email to