[ceph-users] Re: nodes with high density of OSDs

Tim Holloway Thu, 10 Apr 2025 18:10:31 -0700

Peter,

I don't think udev factors in based on the original question. Firstly,because I'm not sure udev deals with permanently-attached devices (it'smore for hot-swap items). Secondly, because the original complaintmentioned LVM specifically.

I agree that the hosts seem overloaded, by the way. It sounds like largedisks are being subdivided into many smaller disks, which would be badfor Ceph to do on HDDs, and while SSDs don't have the seek androtational liabilities of HDDs, it's still questionable as to how manyconnections you really should be making to one physical unit that way.

Ceph, for reasons I never discovered prefers that you create OSDs thateither own an entire physical disk or an LVM Logical Volume, but NOT adisk partition. I find it curious, since LVs aren't necessarilycontiguous space (again, more of a liability for HDDs than SSDs). unliketraditional partitions, but there you are. Incidentally, LVs arecontained in Volume Groups, and the whole can end up with partsscattered over multiple Physical Volumes (PVs).

When an LVM-supporting OS boots, part of the process is to run an lvscan(lvscan -ay) to locate and activate Logical Volumes, and from theinformation given, it's assumed that the lvscan process hasn't completedbefore Ceph starts up and begins trying to use them. The boot lvscan isnormally pretty quick, since it would be rare to have more than a dozenor so LVs in the system.

But in this case, more than 100 LVs are being configured at boot timeand the systemd boot process doesn't currently account for the extratime needed to do that.

If I haven't got my facts too badly scrambled, LVs end up being mappedto dm devices, but that's something I normally only pay attention towhen hardware isn't behaving so I'm not really expert on that.


Hope that helps,

   Tim

On 4/10/25 16:43, Peter Grandi wrote:

I have a 4 nodes with 112 OSDs each [...]

As an aside I rekon that is not such a good idea as Ceph was
designed for one-small-OSD per small-server and lots of them,
but lots of people of course know better.

Maybe you can gimme a hint how to struggle it over?

That is not so much a Ceph question but a distribution question
anyhow there are two possible hints that occur to me:

* In most distributions the automatic activation of block
   devices is done by the kernel plus 'udevd' rules and/or
   'systemd' units.

* There are timeouts for activation of storage devices and on a
   system with many, depending on type etc., there may be a
   default setting to activate them serially instead of in
   parallel to prevent sudden power consumption and other surges,
   so some devices may not activate because of timeouts.

You can start by asking the sysadmin for those machines to look
at system logs (distribution dependent) for storage device
activation reports to confirm whether the guesses above apply to
your situation and if confirmed you can ask them to change the
relevant settings for the distribution used.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

Reply via email to