Bug#904558: What should happen when maintscripts fail to restart a service

Simon McVittie Wed, 17 Oct 2018 13:51:46 -0700

On Tue, 09 Oct 2018 at 20:35:33 +0200, Wouter Verhelst wrote:
> According to "man invoke-rc.d", policy-rc.d can exit with exit state 106
> and provide a number of actions on stdout. These are then actions that
> invoke-rc.d must try in order "until one of them succeeds". As such, a
> policy-rc.d implementation written like so:
> 
> #!/bin/sh
> 
> if [ "$1" = ssh ]             # logic error fixed as per subsequent mail
> then
>       exit 0
> fi
> echo "$2 stop"
> exit 106
> 
> would result in the system attempting whatever init script action was
> being asked for, followed by a "stop" action (except in the case of the
> "ssh" service, which must not fail before we close a shell, ever). This
> assumes that a "stop" action when the daemon fails to start will be
> successful


If I'm reading invoke-rc.d correctly, this is implemented (in a cross-init
way), but probably doesn't interact well with the logic that avoids
(re)starting services that are disabled, because that doesn't consider
"restart stop" to match "restart".

Obviously, if I'm right about that limitation, then that's a bug, and
bugs can be fixed. However, it makes me concerned that the exit status
106 thing is not well-understood or well-tested, even by invoke-rc.d
maintainers.

Packages that have systemd units with no corresponding LSB init
script (not necessarily services - timer, socket, path and (auto)mount
units are also units) use deb-systemd-invoke instead of
invoke-rc.d. deb-systemd-invoke doesn't implement the full generality
of the policy-rc.d interface, but only 0, 101 and 104 (in particular
not 106). That would be a reasonable feature request, particularly if
we want to encourage this route, but it isn't currently implemented.

While discussing this on IRC we wondered whether maintainer scripts
that restart services should be normally be using an interface that is
analogous to "systemctl try-restart", namely: check whether the service
is running, then restart it if it was. (This can't work for maintainer
scripts that stop the service in prerm and start it in postinst, but
that is no longer the default behaviour in recent debhelper compat
levels.) However, both dh_installinit and dh_installsystemd currently
use plain "restart", so if the service is not running (possibly because
it's already broken), it will usually be started.

> With that background, IMHO the proper reply to this question before the
> committee is that yes, postinst scripts should fail when an init script
> fails, but we should also better document the policy-rc.d interface to
> point out that the above is possible and can be done where it makes
> sense.

This would solve Marga's use case with a very large fleet of machines
maintained by a small number of sysadmins: they can install a policy-rc.d
on all those machines that does the right thing.

However, it leaves the default as "fail hard", which I'm not convinced
is the most appropriate thing for systems that lack an experienced
sysadmin (which are the systems where defaults matter most, because an
inexperienced user is the least able to make an informed decision about
where they should deviate from defaults).

policy-rc.d also has some practical integration issues. It normally relies
on putting an unpackaged file in /usr/sbin (unless you have installed
policyrcd-script-zg2), and it's common for tools like debootstrap and
debian-installer to create and delete policy-rc.d to suppress service
startup while carrying out bootstrap operations. One Debian derivative
that I'm involved in (SteamOS) is *meant* to have a policy-rc.d, but we
recently discovered that it has always been deleted at the end of the
debian-installer run, and so doesn't exist in practice.

    smcv

Bug#904558: What should happen when maintscripts fail to restart a service

Reply via email to