After further troubleshooting with cgregan, we've further narrowed this down.
We ran the following script on the node that was having trouble: https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b Unlike all the other devices MAAS works with, the Intel NVMe device reports a serial number that cannot be found anywhere in /dev/disk/by- id/*. When curtin is supplied a serial number, it uses a heuristic to find the device as follows: http://bazaar.launchpad.net/~curtin- dev/curtin/trunk/view/435/curtin/commands/block_meta.py#L270 http://bazaar.launchpad.net/~curtin- dev/curtin/trunk/view/435/curtin/block/__init__.py#L601 So arguably, this is a bug in the Intel NVMe serial number; the way it populates /dev/disk/* leaves much to be desired. This is *arguably* a bug in curtin (and maybe MAAS, since we knowingly use the serial number even though `udevadm` can tell us that the serial cannot be found anywhere in /dev/disk/by-id/*), in that we could do a better job dealing with devices backed by not-so-robust kernel drivers. But I think we shouldn't encourage bad behavior on the part of driver writers, so I'm on the fence about whether or not we should fix it. But mostly, I would argue that this is a bug in the Intel NVMe driver. The way they expose the device to userland is non-standard and arguably broken. When we ran `udevadm info -q all -n nvme0n1` on the device, we got the following pseudo-output: nvme0n1: P: /devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1 N: nvme0n1 S: SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx S: disk/by-id/nvme-INTEL E: DEVLINKS=/dev/disk/by-id/nvme-INTEL /dev/SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx E: DEVNAME=/dev/nvme0n1 E: DEVPATH=/devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1 E: DEVTYPE=disk E: ID_SERIAL=INTEL SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx E: ID_SERIAL_SHORT=CVMDxxxxxxxxxxxxxx E: MAJOR=259 E: MINOR=0 E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=xxxxxxx You can see by the lines that start with "S:" and the "DEVLINKS=" line that the way this device is exposed is very non-standard. One would expect /dev/disk/by-id/* to contain a DEVLINK containing the serial number. Instead they expose a 'nvme-INTEL' link, which is (IMHO) a critical bug, because anyone expecting the things in /dev/disk/by-id/* to be unique will be in for a big surprise when they add a second NVMe device to a machine. ** Also affects: curtin Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Status: Invalid => New ** Changed in: linux (Ubuntu Xenial) Status: Fix Committed => New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1651602 Title: Intel NVMe driver does not expose consistent links in /dev/disk/by-id To manage notifications about this bug go to: https://bugs.launchpad.net/curtin/+bug/1651602/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs