That's probably the ceph-disk udev script being triggered from something somewhere (and a lot of things can trigger that script...)
Work-around: convert everything to ceph-volume simple first by running "ceph-volume simple scan" and "ceph-volume simple activate", that will disable udev in the intended way. BTW: you can run destroy before stopping the OSD, you won't need the --yes-i-really-mean-it if it's drained in this case Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Nov 4, 2019 at 6:33 PM J David <j.david.li...@gmail.com> wrote: > > While converting a luminous cluster from filestore to bluestore, we > are running into a weird race condition on a fairly regular basis. > > We have a master script that writes upgrade scripts for each OSD > server. The script for an OSD looks like this: > > ceph osd out 68 > while ! ceph osd safe-to-destroy 68 ; do sleep 10 ; done > systemctl stop ceph-osd@68 > sleep 10 > systemctl kill ceph-osd@68 > sleep 10 > umount /var/lib/ceph/osd/ceph-68 > ceph osd destroy 68 --yes-i-really-mean-it > ceph-volume lvm zap /dev/sda --destroy > ceph-volume lvm create --bluestore --data /dev/sda --osd-id 68 > sleep 10 > while [ "`ceph health`" != "HEALTH_OK" ] ; do ceph health; sleep 10 ; done > > (It's run with sh -e so any error will cause an abort.) > > The problem we run into is that in about 1 out of 10 runs, when this > gets to the "lvm zap" stage, and fails: > > --> Zapping: /dev/sda > Running command: wipefs --all /dev/sda2 > Running command: dd if=/dev/zero of=/dev/sda2 bs=1M count=10 > stderr: 10+0 records in > 10+0 records out > 10485760 bytes (10 MB, 10 MiB) copied, 0.00667608 s, 1.6 GB/s > --> Destroying partition since --destroy was used: /dev/sda2 > Running command: parted /dev/sda --script -- rm 2 > --> Unmounting /dev/sda1 > Running command: umount -v /dev/sda1 > stderr: umount: /var/lib/ceph/tmp/mnt.9k0GDx (/dev/sda1) unmounted > Running command: wipefs --all /dev/sda1 > stderr: wipefs: error: /dev/sda1: probing initialization failed: > stderr: Device or resource busy > --> RuntimeError: command returned non-zero exit status: 1 > > And, lo and behold, it's right: /dev/sda1 has been remounted as > /var/lib/ceph/osd/ceph-68. > > That's after the OSD has been stopped, killed, and destroyed; there > *is no* osd.68. It happens after the filesystem has been unmounted > twice (once by an explicit umount and once by "lvm zap." The "lvm > zap" umount shown here with the path /var/lib/ceph/tmp/mnt.9k0GDx > suggests that the remount is happening in the background somewhere > while the lvm zap is running. > > If we do the zap before the osd destroy, the same thing happens but > the (still-existing) OSD does not actually restart. So it's just the > filesystem that won't stay unmounted long enough to destroy it, not > the whole OSD. > > What's causing this? How do we keep the filesystem from lurching out > of the grave in mid-conversion like this? > > This is on Debian Stretch with systemd, if that matters. > > Thanks! > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com