Greetings community,

we have a setup comprising of 6 servers hosting CentOS 8 Minimal Installation 
with CEPH Quincy version 18.2.2 supported by 20Gbps fiber optics NICs and a 
dual Xeon Intel processors, bootstrapped the installation on the first node 
then expanded to the others using the cephadm method, having the monitor 
services deployed on 5 of these nodes as well as 3 manager nodes. Each server 
has an NVMe boot disk as well as a 1TBs SATA SSD over which the OSDs are 
deployed. An EC profile was created with k=3 and m=3, serving a CephFS 
filesystem on top with NFS exports to serve other servers. Up to this point, 
the setup is quite stable in the sense that upon emergency reboot or network 
connection failure the OSDs did not fail and remain functional/started normally 
after reboot.

At a certain point in our project, we had the need to activate the multipathd 
service, adding the boot drive partition and the CEPH SSD to its blacklist as 
to not be initialized for use by an mpath partition, the blacklist goes like so:

boot blacklist:
===============
blacklist {
    wwid "eui.<drive_id>"
}

SATA SSD blacklist:
===================
blacklist {
    wwid "naa.<drive_id>"
}

The above blacklist configuration ensures that both the boot disk as well as 
CEPH's OSD function properly, with the following being lsblk output:

NAME                                                                            
                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                             
                                8:0    0 894.3G  0 disk
└─ceph--<id>-osd--block--<block_id>                                             
                                                                                
          252:3    0 894.3G  0 lvm
nvme0n1                                                                         
                          259:0    0 238.5G  0 disk
├─nvme0n1p1                                                                     
                          259:1    0   600M  0 part /boot/efi
├─nvme0n1p2                                                                     
                          259:2    0     1G  0 part /boot
└─nvme0n1p3                                                                     
                          259:3    0 236.9G  0 part
  ├─centos-root                                                                 
                          252:0    0   170G  0 lvm  /
  ├─centos-swap                                                                 
                          252:1    0  23.4G  0 lvm  [SWAP]
  ├─centos-var_log_audit                                                        
                          252:2    0   7.5G  0 lvm  /var/log/audit
  ├─centos-home                                                                 
                          252:4    0    26G  0 lvm  /home
  └─centos-var_log                                                              
                          252:5    0    10G  0 lvm  /var/log

In addition to the above multipathd configuration, we have use_devicesfile=1 in 
/etc/lvm/lvm.conf, with /etc/lvm/devices/system.devices file being like so, 
with PVID used from the output of the pvdisplay command, and the IDNAME value 
extracted from the ouput of "ls -lha /dev/disk/by-id":

VERSION=1.1.1
IDTYPE=sys_wwid IDNAME=eui.<drive_id> DEVNAME=/dev/nvme0n1p3 PVID=<pvid> PART=3
IDTYPE=sys_wwid IDNAME=naa.<drive_id> DEVNAME=/dev/sda PVID=<pvid>


Issues started when performing certain tests regarding the system's integrity, 
most important of which is emergency shutdown's and reboot of all the nodes, 
the behavior that follows is that the OSDs are not started automatically as 
well as their respective LVM volumes not properly showing (except on a single 
node for some reason), hence the lsblk ouput changes like the snippet below, 
requiring us rebooting the nodes one by one until all the OSDs are back online:

NAME                                                                            
                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                             
                                8:0    0 894.3G  0 disk
nvme0n1                                                                         
                          259:0    0 238.5G  0 disk
├─nvme0n1p1                                                                     
                          259:1    0   600M  0 part /boot/efi
├─nvme0n1p2                                                                     
                          259:2    0     1G  0 part /boot
└─nvme0n1p3                                                                     
                          259:3    0 236.9G  0 part
  ├─centos-root                                                                 
                          252:0    0   170G  0 lvm  /
  ├─centos-swap                                                                 
                          252:1    0  23.4G  0 lvm  [SWAP]
  ├─centos-var_log_audit                                                        
                          252:2    0   7.5G  0 lvm  /var/log/audit
  ├─centos-home                                                                 
                          252:4    0    26G  0 lvm  /home
  └─centos-var_log                                                              
                          252:5    0    10G  0 lvm  /var/log

Without the LVM configuration and the multipathd service enabled, everything 
works fine, this behavior started happening after the changes. Attempting to 
manually restart the OSDs from a manager node using ceph orch daemon restart 
osd.n results in an error state, and even when manually starting the OSD on 
each node via the bash /var/lib/ceph/<fsid>/osd.0/unit.run we receive the 
following error:


--> Failed to activate via raw: did not find any matching OSD to activate
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir 
--dev /dev/ceph-<id>/osd-block-<block_id> --path /var/lib/ceph/osd/ceph-0 
--no-mon-config
 stderr: failed to read label for /dev/ceph-<id>/osd-block-<block_id>: (2) No 
such file or directory
2024-03-30T12:42:54.014+0000 7f845296a980 -1 
bluestore(/dev/ceph-<id>/osd-block-<block_id>) _read_bdev_label failed to open 
/dev/ceph-<id>/osd-block-<block_id>: (2) No such file or directory
--> Failed to activate via LVM: command returned non-zero exit status: 1
--> Failed to activate via simple: 'Namespace' object has no attribute 
'json_config'
--> Failed to activate any OSD(s)



A successful run of the same command but with success results in the following 
output:

/bin/bash /var/lib/ceph/<fsid>/osd.0/unit.run
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path 
/var/lib/ceph/osd/ceph-0 --no-mon-config --dev 
/dev/mapper/ceph--<id>-osd--block--<block_id>
Running command: /usr/bin/chown -h ceph:ceph 
/dev/mapper/ceph--<id>-osd--block--<block_id>
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5
Running command: /usr/bin/ln -s /dev/mapper/ceph--<id>-osd--block--<block_id> 
/var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
--> ceph-volume raw activate successful for osd ID: 0
ceph-<fsid>-osd-0
4361e2f166bcdeee6e9020dcbb153d3d7eec04e71d5b0b250440d4a3a0833f2c



It seems to us as if the logical volume in cases of failure is not even 
detected at boot by the device-mapper, which is weird, it's also not showing in 
the output of the dmsetup ls command in cases of failure. What could we be 
missing here? What seems to be the conflict between CEPH OSDs and the 
multipathd service or even the LVM configuration? Should the system.devices 
entry be different than what we set? Is the multipathd blacklisting 
configuration missing something? We have been working on trial and error 
experiments for more than a week now and looked at the lvm2 as well as 
multipathd logs (we could provide them upon request) but to no avail as nothing 
indicates any errors, just normal logs with the difference being the missing 
CEPH OSD LVM volume.



Best regards
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to