[Kernel-packages] [Bug 1463120] Re: Failure to boot if fstab disk mounts fail

Cecil Curry Tue, 20 Sep 2016 23:51:49 -0700

I can confirm this affects all Ubuntu >= 15.04 installations, including
Ubuntu 16.04 (LTS).

> I am reasonably sure that this is not a kernel issue but a boot script
dependency issue...

That's absolutely the case. Your intuition has not led you astray.

> Hopefully someone can point me in the right direction?

To no one's surprise, this is a high-level system.d issue rather than a
low-level kernel issue. Ergo Ubuntu >= 15.04, the release at which
Ubuntu switched from Upstart to system.d.

This issue is a significant show-stopper, particularly for users on
newer UEFI-based systems. Triggering this issue is disturbingly trivial
on such systems. Explicitly editing the "/etc/fstab" file with superuser
permissions is *NOT* required to trigger this issue. Formatting a single
device with the builtin GUI-driven disk utility ("Disks") is all it
takes. And when it takes, most users will have no recourse but to format
their root filesystem and reinstall.

Here's the use case my better half stumbled into this morning. During
Ubuntu installation, users may elect to automount devices to arbitrary
mountpoints when selecting a custom partition scheme. When this is done
on UEFI-based systems, the resulting "/etc/fstab" entries resemble:

UUID=050e1e34-39e6-4072-a03e-ae0bf90ba13a /home/waluigi/wah! ext4
defaults 0 2

If any such device is subsequently reformatted (e.g., via the stock
"Disks" utility), that device's UUID will be arbitrarily changed,
invalidating the UUID previously recorded for that device in
"/etc/fstab". Everything will superficially appear to behave as
expected. On the next reboot, however, the end user will be presented
with the now-infamous Purple Screen of Death. No error messages
(...human-readable or otherwise) will appear, obscuring the underlying
issue.

Attempting to login with "safe mode" fails. Attempting to login to a
root terminal succeeds, albeit only after a few dispiriting rounds of
graphical corruption and terminal cursors disappearing. If the user
successfully navigates this shameful gauntlet of pain *AND* is
sufficiently familiar with low-level system administration to navigate
the crucible by geek fire that is the CLI, there still exists no
explicit indication of the core issue. (Command-line-fu: why have you
betrayed me?)

No relevant warnings or errors appear in either "dmesg" or "journalctl"
output *OR* in "/var/log" logfiles. The only clear indicator that I
could grep across was a single line of "systemctl -a" output showing the
status of some ignorable system service unit with a non-human-readable
name to be "loaded inactive dead", which seemed dimly suspicious.
Further hours of cursing and grepping yielded the final culprit. My
precious sanity. I have less of it now.

Non-tech-savvy users confronted with this problem will probably just
vomit, format, and reinstall. As a Gentoo-hardened Disciple of the
Command-line Faith with fifteen caffeine-addled years experience in
Silicon Valley startup monoculture, I feel overly confident in betting
that even the most battle-weary code warriors will hand in their Richard
M. Stallman fan club memberships when face-planting into this epic fail.
Consider escalating this issue's importance to at least "High" –
possibly even "Critical."

System.d's "NetworkManager.service" is the culprit. For reasons patently
unclear to me [read: "Why I Hate system.d and You Should Too"], this
service unconditionally transitively depends on the mountability of
*ALL* automounts listed in "/etc/fstab" -- regardless of whether these
automounts are actually used and hence required at startup. We can
hopefully agree that "/home/waluigi/wah!" is a non-essential automount.
Waluigi begs to disagree, of course.

System.d's "/usr/lib/systemd/system-generators/systemd-fstab-generator"
script appears to auto-generate one system service unit for each
mountpoint listed in "/etc/fstab". The failure of even a single such
unit is sufficient to bring the entire boot process to its RedHat-
stained knees. The fragility of system.d was already legendary. This
contemptible breakage only cements its well-deserved ill repute. For
shame, Poettering.

To be fair, this issue is the product of numerous cascading failures –
including:

* The failure of the stock "Disks" utility to detect and handle attempts to
format UUID-style devices automounted by "/etc/fstab". On detecting such an
attempt, this utility should either (in descending order of preference):
* Globally search-and-replace the UUID field of all entries in "/etc/fstab"
matching the offending device's prior UUID with the new UUID allocated for that
device. In other words, "sed -i -e 's/^UUID=old-uuid/UUID=new-uuid'
/etc/fstab". Always "sed". (Ideal.)
* Display a non-fatal warning that formatting this device will probably
render the entire system unbootable unless the user manually modifies
"/etc/fstab", but permissively permit them to do it anyway. (Non-ideal, but
this is Linux. Every user should have the right to catastrophically betray
themselves.)
* Display a fatal error prohibiting the user from formatting this device.
(Don't do this.)
* The failure of the Ubuntu startup process to display coherent warnings or
errors. I (and possibly other experienced users bludgeoned by this) would
happily settle for a single line in either "dmesg" or "journalctl" output
indicating the underlying issue. Ideally, however, boot failures should be
detected by the startup process and:
* Rendered into human-readable GUI-based error dialogues or CLI-based error
messages. In either case, the user should be presented with some digestible
snippet of text for subsequent feeding to Google. Boot failures shouldn't
necessitate a degree in post-graduate Computer Science.
* The user should be presented with the option of dropping into a CLI-based
login prompt or (failing that) at least into a rescue terminal with superuser
privileges. For safety, all currently mounted filesystems should be remounted
as read-only. Because something is better than no thing. No thing and the
smugness of the Purple Screen of Death is all we currently got.
* The failure of system.d. Oh, the complete failure of system.d. I don't even.
On the bright side, you can only go up from rock-bottom destitution. Let's
begin by belabouring the hopefully obvious:
* The failure to mount optional automounts should *NOT* render the system
unbootable. (This goes without saying. Apparently, it needs saying.)
* The system.d journal ("journalctl") should explicitly log failing units.
Bizarrely, it doesn't in this case. The only indication of imminent badness is
a discontinuity after system.d attempts to start the "NetworkManager.service".
Although no explicit error or warning is logged, the startup process silently
fails at that service and then auto-restarts itself from the beginning... until
slamming into that service again, at which point the cycle of depressing
incompetence recapitulates itself. Reliably logging errors and warnings would
seem to be the principal use case for "journalctl", but wut do I know? (See no
error, hear no error, log no error.)
* To beat the dead horse, all autogenerated system service units mounting
"/etc/fstab"-listed mountpoints should be strictly optional. Exceptions might
include the obvious system prerequisites (e.g., "/usr", "/var"). Since the root
filesystem has (by definition) already been mounted and system.d has (by
definition) already been exec-ed into the init process via PID 1, the system
should at least be nominally bootable. There's little to no justification for
automount failure to completely destroy bootability. Indeed, I'd probably
advise all such units to be optional – regardless of whether they appear to be
prerequisites or not. If "/usr" isn't mountable, for example, error messages to
that effect will hopefully already be logged to "dmesg" and/or "journalctl"
and/or printed to the current terminal. Moreover, subsequent units will
implicitly fail by virtue of "/usr" going AWOL. System.d prematurely halting
the startup process at the first unmountable automount adds no tangible benefit
to anything. Let the startup process proceed with fingers and tentacles crossed
in the event of automount failure and may the succubus take the hindmost.
* To that effect, all other units currently hard-requiring automount units
("local.target", possibly?) should be revised to use
"Wants=INSERT-MOUNTPOINT-UNIT-NAME-HERE" rather than
"Requires=INSERT-MOUNTPOINT-UNIT-NAME-HERE".

In short, I mourn the untimely death of Upstart.

It'd be awesome-sauce if either the original submitter (Mark) or another
contributor with sufficient privileges could:

* Change the affected packages to at least "system.d" (and possibly
"gnome-disks" as well, if that indeed be the disk utility that Ubuntu currently
leverages).
* Escalate the importance to at least "High" (and possibly "Critical").

Life in the code trenches. It doesn't get uglier than this.

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1463120

Title:
Failure to boot if fstab disk mounts fail

Status in linux package in Ubuntu:
Confirmed

Bug description:
I found this on my main 15.04 desktop but have reproduced it in a VM:
1. Install Ubuntu 15.04 (I included updates and non-free but I doubt that
matters).
2. Add a bogus entry to /etc/fstab, eg:
/dev/sdd1 /mnt/sdd1 ext4 errors=remount-ro 0 0
Note that /mnd/sdd1 exists but /dev/sdd1 does not
3. Reboot

Expected: Failure to mount /dev/sdd1 reported and option to boot without
mounting it
Actual: System appears to hang, although it will eventually present a root
terminal (with no indication of the cause of the problem). It appears to hang
prior to switching graphical mode, with the result that any warnings/errors
that are present on all boots but invisible because the screen clears before
they're seen become visible for the first time incorrectly suggesting they are
the cause of the problem.

Removing (or commenting out) the problematic entries and rebooting
allows the system to boot.

HOWEVER: The grub boot menu now appears and any pre-existing menu
timeout and default action seems to have been lost; it's now necessary
to select a boot option on each boot. This may be a separate bug (or
feature?).

My actual case: I have external drives permanently connected via USB, however
the USB card appears to have failed hence the drives are not accessible. With
no clues (and being unfamiliar to systemd) working out why the system wouldn't
boot was a tough job.
---
ApportVersion: 2.17.2-0ubuntu1.1
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: mark 1382 F.... pulseaudio
DistroRelease: Ubuntu 15.04
HibernationDevice: RESUME=UUID=76b7b79b-0cdd-4605-bc58-381fad8fa67f
InstallationDate: Installed on 2015-06-08 (4 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
IwConfig:
eth0 no wireless extensions.

lo no wireless extensions.
Lsusb:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 80ee:0021 VirtualBox USB Tablet
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: innotek GmbH VirtualBox
Package: linux (not installed)
ProcEnviron:
LANGUAGE=en_GB:en
TERM=xterm
PATH=(custom, no user)
LANG=en_GB.UTF-8
SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-20-generic
root=UUID=53f2fcea-e84c-4736-ac69-f34305a63432 ro quiet splash
ProcVersionSignature: Ubuntu 3.19.0-20.20-generic 3.19.8
PulseList:
Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not
accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-3.19.0-20-generic N/A
linux-backports-modules-3.19.0-20-generic N/A
linux-firmware 1.143.1
RfKill:

Tags: vivid
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.19.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias:
dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1463120/+subscriptions

--
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1463120] Re: Failure to boot if fstab disk mounts fail

Reply via email to