Hello list.

Today while redeploying an OSD I've noticed that links to DB/WAL devices
are pointing to partitions themselves, not to the partition UUID how it was
before.
I think that changed with latest ceph-deploy.

I'm using 12.2.2 on my mon/osd nodes.
ceph-deploy is 2.0.1 on admin node.
All nodes use Ubuntu 16.04.

Here's what I'm talking about.

Consider those two OSDs on one node. osd.4 is an "old" OSD, while osd.6 is
"new" osd.
$ df
...
tmpfs                            7.9G   24K  7.9G   1%
/var/lib/ceph/osd/ceph-6
/dev/sda1                         94M  5.4M   89M   6%
/var/lib/ceph/osd/ceph-4

$ ll /var/lib/ceph/osd/ceph-4
...
lrwxrwxrwx 1 ceph ceph   58 Feb 20  2018 block ->
/dev/disk/by-partuuid/21d7a19b-b520-4fa9-b291-2cd4a215b67b
lrwxrwxrwx 1 ceph ceph   58 Feb 20  2018 block.db ->
/dev/disk/by-partuuid/ad56b072-28dd-4b27-85f2-06067546a0f2

$ ll /var/lib/ceph/osd/ceph-6
...
lrwxrwxrwx 1 root root    9 Nov  7 14:16 block.db -> /dev/sdb3
lrwxrwxrwx 1 root root    9 Nov  7 14:16 block.wal -> /dev/sdb4

I'm using external SSD for DB/WAL, and now newly created OSDs point to
partition names.
Those two ceph-deploy commands have no difference, and lead to OSD created
with links to partitions, not UUIDs:
$ ceph-deploy osd create --bluestore --data /dev/sdc --block-db /dev/sdb1
--block-wal /dev/sdb2 ceph-osd2
$ ceph-deploy osd create --bluestore --data /dev/sdc --block-db
/dev/disk/by-partuuid/e703a35b-de8d-46e0-9fb2-4a4fd0b49c91 --block-wal
/dev/disk/by-partuuid/d2db7017-4d57-45f8-95ff-0131b3b2de7d ceph-osd2
resulting OSD will have partition links:
$ ll /var/lib/ceph/osd/ceph-0
...
lrwxrwxrwx  1 root root    9 Jan 21 16:46 block.db -> /dev/sdb1
lrwxrwxrwx  1 root root    9 Jan 21 16:46 block.wal -> /dev/sdb2


I first saw this change when updated ceph-deploy, but didn't pay attention
back then, but today it hit me - if for somewhat reason my DB/WAL SSD
changes its name on node reboot for example, my OSDs will not find their
DB/WAL devices, and will not start. This will require manual intervention
at least, and possibly some worse scenarios if you use more than 1 DB/WAL
device on a single OSD node.

Is there something wrong with my OSD deployment scheme? Why link naming
logic was changed from UUIDs to straight partitions? Or maybe I'm imagining
this threat and CEPH can compesate in such a case?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to