Here is a condensation of the discussion prior to filing this bug. I have removed all quotes of previous messages (e.g. msg [2] contained quotes from msg [1] that have been removed). I have tried to identify when a message is a reply to parts of several messages or to a message that is not the previous message in this digest. When in doubt, refer to the original message.
[1] https://lists.debian.org/debian-devel/2015/09/msg00496.html * Marvin Renich <m...@renich.org> [150923 13:53]: > <rant> > > From the first time I had dpkg mark a package as half-configured when > everything was correct except that the service would not start for some > reason that had nothing to do with package installation (exactly the > situation here for virtualbox), I have felt that dpkg had no business > failing just because the service would not start. I think that is a > wrong design decision. > > In fact, one specific case that often hurts me is when I have xen > installed on a machine where I only run the hypervisor occasionally. > Upgrading the xen packages causes (or has caused in the past) the > upgrade to fail. This is ridiculous! > > I think it should be documented in the developers reference that if you > attempt to start or restart a service in postinst, you should guard it > so that a failure in the service does not propagate to a failure of the > postinst. > > </rant> [2] https://lists.debian.org/debian-devel/2015/09/msg00508.html * Jeroen Dekkers <jer...@dekkers.ch> [150924 07:23]: > But then when something goes wrong when upgrading and the service > doesn't (re)start apt/dpkg will report success but the service isn't > running anymore. That also sounds wrong to me. Letting postinst fail > might not be the best way to signal this, but to change that we need > something else to let the user know that something went wrong. Just > printing an error message isn't enough, because the user might not see > that (for example when multiple packages are installed/upgraded and a > later package asks some questions using dialog or when using > unattended-upgrades). [3] https://lists.debian.org/debian-devel/2015/09/msg00511.html * Marvin Renich <m...@renich.org> [150924 08:12]: > How does failing the upgrade solve anything? The upgrade should only > fail if the failure of the service to start was because something in the > upgrade itself was broken; this is rarely the case. > > There are two prominent reasons why a service fails to start after an > upgrade: the relationship between the application and its configuration > has changed (e.g. different, incompatible defaults or incompatible file > format) or some external influence that has nothing to do with the > upgrade (e.g. unavailable resource). > > The first case requires the admin to sort out the changes and fix the > configuration. Being required to re-run the dpkg installation just to > flip the 'half-configured' state to 'installed' when the result would > have been the same if dpkg had not failed the first time is wrong. > > In the second case, how is it a dpkg installation failure if the > hypervisor is not running so xen won't start? Everything is installed > perfectly. Or if a daemon fails to start because the ldap server on a > different host is down? Failing the installation is _really_, _really_ > _wrong_! > > What makes this even worse is that when installing or upgrading a large > number of packages, this kind of incorrect failure sometimes affects > many completely unrelated packages. For an unattended upgrade, this is > so much worse than having one service that (for a correct reason) > refused to restart after the upgrade. > > What you are looking for is a more prominent notification that a service > did not restart. But the current situation is like the "check engine" > light flashing when you are low on fuel; yes, it gets your attention, > but it is telling you the wrong thing. [4] https://lists.debian.org/debian-devel/2015/09/msg00518.html * Henrique de Moraes Holschuh <h...@debian.org> [150924 12:21]: > What we really want is a "do not fail upgrade, BUT report that some services > *that were previously running* failed to restart after the upgrade run". > > ESPECIALLY if you are going to take "unattended upgrades" seriously. > > Still, that would need some proper design work, and a reasonable amount of > code to be written and tested. Some of it will hook into the package > system, some of it needs to interface to the services subsystem (systemd, > sysvinit, others). [5] https://lists.debian.org/debian-devel/2015/09/msg00519.html * Paul Gevers <elb...@debian.org> [150924 14:12]: > I would like to add there is more than just services. As the current > maintainer of dbconfig-common, it is more than clear to me that updates > of packages that require updates of (and even installs into) databases > (tables and/or their contents) also fall into this category. If for > whatever reason we can't connect to the database (which may even be on a > different system), there is currently not much that we can do except > register failure. I am currently of the opinion that if that happens, > the package upgrade DID fail, as the package probably won't be working > until the upgrade commands are applied with a working connection. (Just > before people start shouting, the way dbconfig-common handles this is by > asking the administrator if the problem should be fixed by retrying, > ignoring the problem or considering the issue a failure. In > noninteractive mode, the problem is ignored for installs and removals, > but not for upgrades.) [6] https://lists.debian.org/debian-devel/2015/09/msg00525.html * Marvin Renich <m...@renich.org> [150925 08:27]: > [responding to Henrique de Moraes Holschuh [4]] > I agree, but I don't think we should wait for this feature to appear > before fixing packages to _not_ fail upgrades when the service fails to > start. The current situation does more harm than good. > > [responding to Paul Gevers [5]] > I agree completely. The decision on whether or not to fail the dpkg > installation should depend on what action needs to be taken to correct > the situation (and this is true whether we are talking about a service > failing to start or a database upgrade failure or something else). > > However, most existing cases of service restart failures require > something other than re-running the dpkg installation to fix them, and > the default, without careful thought by the maintainer about the > possible failure modes, should be to allow the dpkg run to succeed. > > Should I open a wishlist bug against the developers reference pointing > to this discussion? [7] https://lists.debian.org/debian-devel/2015/09/msg00532.html * Eduard Bloch <e...@gmx.de> [150926 05:25]: > I am wondering why this topic doesn't get more attention. For me, it > feels like being one of the top causes of breaking an upgrade process > somewhere inbetween, leaving the system in some intermediate state... > with modern APT, it has become easier to continue from this messy > situation but it's still a situation I would like to avoid. > > The basic idea might be that a package should be able to handle > startup failures in different categories (and resolution strategies), > defined by maintainers. However, it's not so easy because of > subsequent errors that might happen in other services far way in the > dependency chain, and it's hard to predict them all. > > We need some compromise here. Something I imagine is: > > a) packages that participate in the "error-tolerant" scheme get some > attribute set. They also run delicate commands through a wrapper command > that collects the failure/success state and records TODO tickets in some > global configuration file. > > b) apt might add additional hints to the package installation, letting > maintainer scripts know whether there are dependent packages somewhere > in the chain. > > c) for failed tasks, dpkg and apt frontends show the user messages > "there are things to fix that require your attention: <list of issues>", > and when the admin solved the problem, he can close the ticket with the > imaginary tool. [8] https://lists.debian.org/debian-devel/2015/09/msg00542.html * Jeroen Dekkers <jer...@dekkers.ch> [150926 09:44]: > [responding to Marvin Renich [3]] > I think it solves the problem of notifying the user that something > went wrong quite clearly. Not in the correct way, I agree with that, > but the solution to that should be to notify the user in a better way, > not to stop notifying the user. Failing silently is worse than failing > in the wrong way. > > Unattended-upgrades has the MinimalSteps option that splits upgrades > in the smallest possible chunks so that isn't really a problem. > > Yes, but the way to solve that is to flash a "low on fuel" light, not > to stop notifying you and leaving you alone in the desert without > fuel. And if a "low on fuel" light isn't possible, it's better to keep > flashing the "check engine" light like it has been doing for the past > 15 years.