Le 23/08/2018 à 15:20, Alfredo Deza a écrit :
Thanks Alfredo for your reply. I'm using the very last version of Luminous
(12.2.7) and ceph-deploy (2.0.1).
I have no problem in creating my OSD, that's work perfectly.
My issue only concerns the problem of the mount names of the NVMe partitions
which change after a reboot when there are more than one NVMe device on the
OSD node.
ceph-volume is pretty resilient to partition changes because it stores
the PARTUUID of the partition in LVM, and it queries
it each time at boot. Note that for bluestore there is no mounting
whatsoever. Have you created partitions with a PARTUUID on the nvme
devices for block.db ?
Here is how I created my BlueStore OSDs (in the first OSD node) :
1) On the OSD node /node//-osd0/, I first created block partitions on
the NVMe device (PM1725a 800GB), like this :
# parted /dev/nvme0n1 mklabel gpt
# echo "1 0 10
2 10 20
3 20 30
4 30 40
5 40 50
6 50 60
7 60 70
8 70 80
9 80 90
10 90 100" | while read num beg end; do parted /dev/nvme0n1 mkpart $num
$beg% $end%; done
Extract of cat /proc/partitions :
259 2 781412184 nvme1n1
259 3 781412184 nvme0n1
259 5 78140416 nvme0n1p1
259 6 78141440 nvme0n1p2
259 7 78140416 nvme0n1p3
259 8 78141440 nvme0n1p4
259 9 78141440 nvme0n1p5
259 10 78141440 nvme0n1p6
259 11 78140416 nvme0n1p7
259 12 78141440 nvme0n1p8
259 13 78141440 nvme0n1p9
259 15 78140416 nvme0n1p10
2) Then, from the admin node, I created my 10 first OSDs like this :
echo "/dev/sda /dev/nvme0n1p1
/dev/sdb /dev/nvme0n1p2
/dev/sdc /dev/nvme0n1p3
/dev/sdd /dev/nvme0n1p4
/dev/sde /dev/nvme0n1p5
/dev/sdf /dev/nvme0n1p6
/dev/sdg /dev/nvme0n1p7
/dev/sdh /dev/nvme0n1p8
/dev/sdi /dev/nvme0n1p9
/dev/sdj /dev/nvme0n1p10" | while read hdd db; do ceph-deploy osd create
--debug --bluestore --data $hdd --block-db $db node-osd0; done
What you mean is that, at this stage, I must directly declare the UUID
paths in value of --block.db (i.e. replace /dev/nvme0n1p1 with its
PARTUUID), that is ?
Currently, I created 60 OSDs like that. The ceph cluster is HEALTH_OK
and all osds are up and in. But I'm not yet in prodcution and there is
only test data on it, so I can destroy everything and rebuild my OSDs.
That's what you advise me to do there, taking care to specify the
PARTUUID for the block.db instead of the device names ?
For instance, if I have two NVMe devices, the first time, the first device
is mounted with name /dev/nvme0n1 and the second device with name
/dev/nvme1n1. After node restart, these names can be reversed, that is, the
first device named /dev/nvme1n1 and the second one /dev/nvme0n1 ! The result
is that OSDs no longer find their metadata and do not start up...
This sounds very odd. Could you clarify where block and block.db are?
Also useful here would be to take a look at
/var/log/ceph/ceph-volume-systemd.log and ceph-volume.log to
see how ceph-volume is trying to get this OSD up and running.
Also useful would be to check `ceph-volume lvm list` to verify that
regardless of the name change, it recognizes the correct partition
mapped to the OSD
Oops !
# ceph-volume lvm list
--> KeyError: 'devices'
Thank you again,
Hervé
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com