Ryan, We believe this is a bug as we expect curtin to wipe the disks. In this case it's failing to wipe the disks and occasionally that causes issues with our automation deploying ceph on those disks. This may be more of an issue with LVM and a race condition trying to wipe all of the disks sequentially simply with the large number of disks/vgs/lvs. To clarify from my previous testing, I was mistaken, I thought MAAS used the commissioning OS as the ephemeral OS from which to deploy from, this is not the case. MAAS uses the specified deployment OS as the ephemeral image to deploy from. Based on this all of my previous testing was done with Bionic using the 4.15 kernel. This proves it is a race condition somewhere as sometimes this error does not reproduce and it was just a coincidence that I was changing the commissioning OS.
I have tested this morning and have been able to reproduce the issue with bionic 4.15 and xenial 4.4 however I have yet to reproduce it using either bionic or xenial hwe kernels. I will upload the curtin logs and config from my reproducer now. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1871874 Title: lvremove occasionally fails on nodes with multiple volumes and curtin does not catch the failure Status in curtin package in Ubuntu: Incomplete Status in linux package in Ubuntu: Incomplete Bug description: For example: Wiping lvm logical volume: /dev/ceph-db-wal-dev-sdc/ceph-db-dev-sdi wiping 1M on /dev/ceph-db-wal-dev-sdc/ceph-db-dev-sdi at offsets [0, -1048576] using "lvremove" on ceph-db-wal-dev-sdc/ceph-db-dev-sdi Running command ['lvremove', '--force', '--force', 'ceph-db-wal-dev-sdc/ceph-db-dev-sdi'] with allowed return codes [0] (capture=False) device-mapper: remove ioctl on (253:14) failed: Device or resource busy Logical volume "ceph-db-dev-sdi" successfully removed On a node with 10 disks configured as follows: /dev/sda2 / /dev/sda1 /boot /dev/sda3 /var/log /dev/sda5 /var/crash /dev/sda6 /var/lib/openstack-helm /dev/sda7 /var /dev/sdj1 /srv sdb and sdc are used for BlueStore WAL and DB sdd, sde, sdf: ceph OSDs, using sdb sdg, sdh, sdi: ceph OSDs, using sdc across multiple servers this happens occasionally with various disks. It looks like this maybe a race condition maybe in lvm as curtin is wiping multiple volumes before lvm fails To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1871874/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp