Bug#801065: digest of debian-devel discussion leading to this bug

Marvin Renich Mon, 05 Oct 2015 15:10:09 -0700

Here is a condensation of the discussion prior to filing this bug.  I
have removed all quotes of previous messages (e.g. msg [2] contained
quotes from msg [1] that have been removed).  I have tried to identify
when a message is a reply to parts of several messages or to a message
that is not the previous message in this digest.  When in doubt, refer
to the original message.


[1] https://lists.debian.org/debian-devel/2015/09/msg00496.html
* Marvin Renich <m...@renich.org> [150923 13:53]:
> <rant>
> 
> From the first time I had dpkg mark a package as half-configured when
> everything was correct except that the service would not start for some
> reason that had nothing to do with package installation (exactly the
> situation here for virtualbox), I have felt that dpkg had no business
> failing just because the service would not start.  I think that is a
> wrong design decision.
> 
> In fact, one specific case that often hurts me is when I have xen
> installed on a machine where I only run the hypervisor occasionally.
> Upgrading the xen packages causes (or has caused in the past) the
> upgrade to fail.  This is ridiculous!
> 
> I think it should be documented in the developers reference that if you
> attempt to start or restart a service in postinst, you should guard it
> so that a failure in the service does not propagate to a failure of the
> postinst.
> 
> </rant>

[2] https://lists.debian.org/debian-devel/2015/09/msg00508.html
* Jeroen Dekkers <jer...@dekkers.ch> [150924 07:23]:
> But then when something goes wrong when upgrading and the service
> doesn't (re)start apt/dpkg will report success but the service isn't
> running anymore. That also sounds wrong to me. Letting postinst fail
> might not be the best way to signal this, but to change that we need
> something else to let the user know that something went wrong. Just
> printing an error message isn't enough, because the user might not see
> that (for example when multiple packages are installed/upgraded and a
> later package asks some questions using dialog or when using
> unattended-upgrades).

[3] https://lists.debian.org/debian-devel/2015/09/msg00511.html
* Marvin Renich <m...@renich.org> [150924 08:12]:
> How does failing the upgrade solve anything?  The upgrade should only
> fail if the failure of the service to start was because something in the
> upgrade itself was broken; this is rarely the case.
> 
> There are two prominent reasons why a service fails to start after an
> upgrade:  the relationship between the application and its configuration
> has changed (e.g. different, incompatible defaults or incompatible file
> format) or some external influence that has nothing to do with the
> upgrade (e.g.  unavailable resource).
> 
> The first case requires the admin to sort out the changes and fix the
> configuration.  Being required to re-run the dpkg installation just to
> flip the 'half-configured' state to 'installed' when the result would
> have been the same if dpkg had not failed the first time is wrong.
> 
> In the second case, how is it a dpkg installation failure if the
> hypervisor is not running so xen won't start?  Everything is installed
> perfectly.  Or if a daemon fails to start because the ldap server on a
> different host is down?  Failing the installation is _really_, _really_
> _wrong_!
> 
> What makes this even worse is that when installing or upgrading a large
> number of packages, this kind of incorrect failure sometimes affects
> many completely unrelated packages.  For an unattended upgrade, this is
> so much worse than having one service that (for a correct reason)
> refused to restart after the upgrade.
> 
> What you are looking for is a more prominent notification that a service
> did not restart.  But the current situation is like the "check engine"
> light flashing when you are low on fuel; yes, it gets your attention,
> but it is telling you the wrong thing.

[4] https://lists.debian.org/debian-devel/2015/09/msg00518.html
* Henrique de Moraes Holschuh <h...@debian.org> [150924 12:21]:
> What we really want is a "do not fail upgrade, BUT report that some services
> *that were previously running* failed to restart after the upgrade run".
> 
> ESPECIALLY if you are going to take "unattended upgrades" seriously.
> 
> Still, that would need some proper design work, and a reasonable amount of
> code to be written and tested.  Some of it will hook into the package
> system, some of it needs to interface to the services subsystem (systemd,
> sysvinit, others).

[5] https://lists.debian.org/debian-devel/2015/09/msg00519.html
* Paul Gevers <elb...@debian.org> [150924 14:12]:
> I would like to add there is more than just services. As the current
> maintainer of dbconfig-common, it is more than clear to me that updates
> of packages that require updates of (and even installs into) databases
> (tables and/or their contents) also fall into this category. If for
> whatever reason we can't connect to the database (which may even be on a
> different system), there is currently not much that we can do except
> register failure. I am currently of the opinion that if that happens,
> the package upgrade DID fail, as the package probably won't be working
> until the upgrade commands are applied with a working connection. (Just
> before people start shouting, the way dbconfig-common handles this is by
> asking the administrator if the problem should be fixed by retrying,
> ignoring the problem or considering the issue a failure. In
> noninteractive mode, the problem is ignored for installs and removals,
> but not for upgrades.)

[6] https://lists.debian.org/debian-devel/2015/09/msg00525.html
* Marvin Renich <m...@renich.org> [150925 08:27]:
> [responding to Henrique de Moraes Holschuh [4]]
> I agree, but I don't think we should wait for this feature to appear
> before fixing packages to _not_ fail upgrades when the service fails to
> start.  The current situation does more harm than good.
> 
> [responding to Paul Gevers [5]]
> I agree completely.  The decision on whether or not to fail the dpkg
> installation should depend on what action needs to be taken to correct
> the situation (and this is true whether we are talking about a service
> failing to start or a database upgrade failure or something else).
> 
> However, most existing cases of service restart failures require
> something other than re-running the dpkg installation to fix them, and
> the default, without careful thought by the maintainer about the
> possible failure modes, should be to allow the dpkg run to succeed.
> 
> Should I open a wishlist bug against the developers reference pointing
> to this discussion?

[7] https://lists.debian.org/debian-devel/2015/09/msg00532.html
* Eduard Bloch <e...@gmx.de> [150926 05:25]:
> I am wondering why this topic doesn't get more attention. For me, it
> feels like being one of the top causes of breaking an upgrade process
> somewhere inbetween, leaving the system in some intermediate state...
> with modern APT, it has become easier to continue from this messy
> situation but it's still a situation I would like to avoid.
> 
> The basic idea might be that a package should be able to handle
> startup failures in different categories (and resolution strategies),
> defined by maintainers. However, it's not so easy because of
> subsequent errors that might happen in other services far way in the
> dependency chain, and it's hard to predict them all.
> 
> We need some compromise here. Something I imagine is:
> 
> a) packages that participate in the "error-tolerant" scheme get some
> attribute set. They also run delicate commands through a wrapper command
> that collects the failure/success state and records TODO tickets in some
> global configuration file.
> 
> b) apt might add additional hints to the package installation, letting
> maintainer scripts know whether there are dependent packages somewhere
> in the chain.
> 
> c) for failed tasks, dpkg and apt frontends show the user messages
> "there are things to fix that require your attention: <list of issues>",
> and when the admin solved the problem, he can close the ticket with the
> imaginary tool.

[8] https://lists.debian.org/debian-devel/2015/09/msg00542.html
* Jeroen Dekkers <jer...@dekkers.ch> [150926 09:44]:
> [responding to Marvin Renich [3]]
> I think it solves the problem of notifying the user that something
> went wrong quite clearly. Not in the correct way, I agree with that,
> but the solution to that should be to notify the user in a better way,
> not to stop notifying the user. Failing silently is worse than failing
> in the wrong way.
> 
> Unattended-upgrades has the MinimalSteps option that splits upgrades
> in the smallest possible chunks so that isn't really a problem.
> 
> Yes, but the way to solve that is to flash a "low on fuel" light, not
> to stop notifying you and leaving you alone in the desert without
> fuel. And if a "low on fuel" light isn't possible, it's better to keep
> flashing the "check engine" light like it has been doing for the past
> 15 years.

Bug#801065: digest of debian-devel discussion leading to this bug

Reply via email to