[ceph-users] HDD replacements with shared NVME lvm for DB - what am I missing here

Mikael Öhman Tue, 04 Nov 2025 05:02:01 -0800

My hosts have 42 HDDs, sharing 3 NVMEs for DB/WAL partitions (14 OSDs per
NVME).
It's all using ceph orch, containerized setup, using LVMs, so it's probably
the most conventional HDD based setup one can do.


I have the basic osd spec:
---
placement:
 host_pattern: "mimer-osd02"
service_id: osd_spec
service_type: osd
spec:
 data_devices:
   rotational: 1
 db_devices:
   rotational: 0

But unless the NVME is completely empty orch will just never pick it up
(which I of course don't want to do, as it brings down 13 other OSDs).
Instead orch just flat out ignores the requirement that db_devices must go
on rotational: 0 and incorrectly suggests the broken setup:

ceph orch apply -i osd_spec_osd02.yml --dry-run
...
################
OSDSPEC PREVIEWS
################
+---------+----------+-------------+---------------------+----+-----+
|SERVICE  |NAME      |HOST         |DATA                 |DB  |WAL  |
+---------+----------+-------------+---------------------+----+-----+
|osd      |osd_spec  |mimer-osd02  |/dev/mapper/mpathau  |-   |-    |
+---------+----------+-------------+---------------------+----+-----+

which is the worst outcome, so I have to set all specs to be unmanaged,
since otherwise they do the wrong thing automatically.
So, most of https://docs.ceph.com/en/squid/cephadm/services/osd/ can just
be ignored, since it won't work. Instead I'm stuck with this long
complicated procedure, ensuring all specs are unmanaged, recreate the db
lvm manually making up a UUID for it so that it matches the rest of the
osds, then manually enter a shell, copy over the client.bootstrap-osd
keyring, run this huge

ceph-volume lvm prepare --bluestore --no-systemd --osd-id 12345 --data
/dev/mapper/mpathxx --block.db
/dev/ceph-2e401d48-931b-4529-88c0-d36424560xxx/osd-db-61264cc4-0f40-458b-a4d1-08709b919xxx

manually adding the daemon, starting it. I don't love this procedure and
it's hard to explain all the things that can go wrong to new staff.

I really miss the basic step by step of how to perform the most common
operation; replacing a hdd with a ssd db partition. I have tried things
with --replace and --no-destroy (a flag that isn't documented at all), but
I can't understand from the documentation what it wants from me when I have
separate DB partitions.


--------

A bit of a tangent, butI also don't understand "REJECT REASONS" from ceph
orch device ls

Insufficient space (<10 extents) on vgs, LVM detected

Why even bother looking at extents on VGs if the presence of LVM by itself
already rejects the drive.
In my case, it does detect that nvme0n1 has sufficient space on the vgs:

mimer-osd02    /dev/mapper/mpathau  hdd
                                           12.7T  Yes        10m ago

mimer-osd02    /dev/nvme0n1         ssd   KCM61VUL800G_7170A009TM38
                745G  No         10m ago    Has a FileSystem, LVM detected

mimer-osd02    /dev/nvme1n1         ssd   KCM61VUL800G_7170A004TM38
                745G  No         10m ago    Has a FileSystem, Insufficient
space (<10 extents) on vgs, LVM detected
mimer-osd02    /dev/nvme2n1         ssd   KCM61VUL800G_7170A007TM38
                745G  No         10m ago    Has a FileSystem, Insufficient
space (<10 extents) on vgs, LVM detected

but, "LVM detected" already rejects it. Checking the code
(src/ceph-volume/ceph_volume/util_device.py):

        self.available_lvm, self.rejected_reasons_lvm =
self._check_lvm_reject_reasons()
        self.available_raw, self.rejected_reasons_raw =
self._check_raw_reject_reasons()
        self.available = self.available_lvm and self.available_raw
        self.rejected_reasons = list(set(self.rejected_reasons_lvm +
                                         self.rejected_reasons_raw))

So apparently, the device must be available both as RAW (i.e. have no vgs)
and also have enough space on its LVMs (i.e. have enough vgs). This code
doesn't make any sense at all.
In addition, is also checks disk.blkdid for any TYPE field
    @property
    def has_fs(self) -> bool:
        self.load_blkid_api()
        return 'TYPE' in self.blkid_api
and for a device in lvm it would have something like {'UUID':
'3OWtic-3RCI-G3g5-7M6D-pKpb-eMJe-X0zgif', 'TYPE': 'LVM2_member'}
so again it doesn't make any sense to ever have _check_lvm_reject_reasons,
since it must never be a LVM2_member anyway.

Best regards, Mikael
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] HDD replacements with shared NVME lvm for DB - what am I missing here

Reply via email to