After further troubleshooting with cgregan, we've further narrowed this
down.

We ran the following script on the node that was having trouble:

https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b

Unlike all the other devices MAAS works with, the Intel NVMe device
reports a serial number that cannot be found anywhere in /dev/disk/by-
id/*. When curtin is supplied a serial number, it uses a heuristic to
find the device as follows:

http://bazaar.launchpad.net/~curtin-
dev/curtin/trunk/view/435/curtin/commands/block_meta.py#L270

http://bazaar.launchpad.net/~curtin-
dev/curtin/trunk/view/435/curtin/block/__init__.py#L601

So arguably, this is a bug in the Intel NVMe serial number; the way it
populates /dev/disk/* leaves much to be desired.

This is *arguably* a bug in curtin (and maybe MAAS, since we knowingly
use the serial number even though `udevadm` can tell us that the serial
cannot be found anywhere in /dev/disk/by-id/*), in that we could do a
better job dealing with devices backed by not-so-robust kernel drivers.
But I think we shouldn't encourage bad behavior on the part of driver
writers, so I'm on the fence about whether or not we should fix it.

But mostly, I would argue that this is a bug in the Intel NVMe driver.
The way they expose the device to userland is non-standard and arguably
broken. When we ran `udevadm info -q all -n nvme0n1` on the device, we
got the following pseudo-output:

nvme0n1:
P: /devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1
N: nvme0n1
S: SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
S: disk/by-id/nvme-INTEL
E: DEVLINKS=/dev/disk/by-id/nvme-INTEL /dev/SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
E: DEVNAME=/dev/nvme0n1
E: DEVPATH=/devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1
E: DEVTYPE=disk
E: ID_SERIAL=INTEL SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
E: ID_SERIAL_SHORT=CVMDxxxxxxxxxxxxxx
E: MAJOR=259
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=xxxxxxx

You can see by the lines that start with "S:" and the "DEVLINKS=" line
that the way this device is exposed is very non-standard. One would
expect /dev/disk/by-id/* to contain a DEVLINK containing the serial
number. Instead they expose a 'nvme-INTEL' link, which is (IMHO) a
critical bug, because anyone expecting the things in /dev/disk/by-id/*
to be unique will be in for a big surprise when they add a second NVMe
device to a machine.

** Also affects: curtin
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu)
       Status: Invalid => New

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1651602

Title:
  Intel NVMe driver does not expose consistent links in /dev/disk/by-id

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1651602/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to