Peter,

I don't think udev factors in based on the original question. Firstly, because I'm not sure udev deals with permanently-attached devices (it's more for hot-swap items). Secondly, because the original complaint mentioned LVM specifically.

I agree that the hosts seem overloaded, by the way. It sounds like large disks are being subdivided into many smaller disks, which would be bad for Ceph to do on HDDs, and while SSDs don't have the seek and rotational liabilities of HDDs, it's still questionable as to how many connections you really should be making to one physical unit that way.

Ceph, for reasons I never discovered prefers that you create OSDs that either own an entire physical disk or an LVM Logical Volume, but NOT a disk partition. I find it curious, since LVs aren't necessarily contiguous space (again, more of a liability for HDDs than SSDs). unlike traditional partitions, but there you are. Incidentally, LVs are contained in Volume Groups, and the whole can end up with parts scattered over multiple Physical Volumes (PVs).

When an LVM-supporting OS boots, part of the process is to run an lvscan (lvscan -ay) to locate and activate Logical Volumes, and from the information given, it's assumed that the lvscan process hasn't completed before Ceph starts up and begins trying to use them. The boot lvscan is normally pretty quick, since it would be rare to have more than a dozen or so LVs in the system.

But in this case, more than 100 LVs are being configured at boot time and the systemd boot process doesn't currently account for the extra time needed to do that.

If I haven't got my facts too badly scrambled, LVs end up being mapped to dm devices, but that's something I normally only pay attention to when hardware isn't behaving so I'm not really expert on that.

Hope that helps,

   Tim

On 4/10/25 16:43, Peter Grandi wrote:
I have a 4 nodes with 112 OSDs each [...]
As an aside I rekon that is not such a good idea as Ceph was
designed for one-small-OSD per small-server and lots of them,
but lots of people of course know better.

Maybe you can gimme a hint how to struggle it over?
That is not so much a Ceph question but a distribution question
anyhow there are two possible hints that occur to me:

* In most distributions the automatic activation of block
   devices is done by the kernel plus 'udevd' rules and/or
   'systemd' units.

* There are timeouts for activation of storage devices and on a
   system with many, depending on type etc., there may be a
   default setting to activate them serially instead of in
   parallel to prevent sudden power consumption and other surges,
   so some devices may not activate because of timeouts.

You can start by asking the sysadmin for those machines to look
at system logs (distribution dependent) for storage device
activation reports to confirm whether the guesses above apply to
your situation and if confirmed you can ask them to change the
relevant settings for the distribution used.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to