[ceph-users] Re: nodes with high density of OSDs

Anthony D'Atri Thu, 10 Apr 2025 20:14:43 -0700

Filestore IIRC used partitions, with cute hex GPT types for various states and 
roles.  Udev activation was sometimes problematic, and LVM tags are more 
flexible and reliable than the prior approach.  There no doubt is more to it 
but that’s what I recall.


> On Apr 10, 2025, at 9:11 PM, Tim Holloway <t...@mousetech.com> wrote:
> 
> Peter,
> 
> I don't think udev factors in based on the original question. Firstly, 
> because I'm not sure udev deals with permanently-attached devices (it's more 
> for hot-swap items). Secondly, because the original complaint mentioned LVM 
> specifically.
> 
> I agree that the hosts seem overloaded, by the way. It sounds like large 
> disks are being subdivided into many smaller disks, which would be bad for 
> Ceph to do on HDDs, and while SSDs don't have the seek and rotational 
> liabilities of HDDs, it's still questionable as to how many connections you 
> really should be making to one physical unit that way.
> 
> Ceph, for reasons I never discovered prefers that you create OSDs that either 
> own an entire physical disk or an LVM Logical Volume, but NOT a disk 
> partition. I find it curious, since LVs aren't necessarily contiguous space 
> (again, more of a liability for HDDs than SSDs). unlike traditional 
> partitions, but there you are. Incidentally, LVs are contained in Volume 
> Groups, and the whole can end up with parts scattered over multiple Physical 
> Volumes (PVs).
> 
> When an LVM-supporting OS boots, part of the process is to run an lvscan 
> (lvscan -ay) to locate and activate Logical Volumes, and from the information 
> given, it's assumed that the lvscan process hasn't completed before Ceph 
> starts up and begins trying to use them. The boot lvscan is normally pretty 
> quick, since it would be rare to have more than a dozen or so LVs in the 
> system.
> 
> But in this case, more than 100 LVs are being configured at boot time and the 
> systemd boot process doesn't currently account for the extra time needed to 
> do that.
> 
> If I haven't got my facts too badly scrambled, LVs end up being mapped to dm 
> devices, but that's something I normally only pay attention to when hardware 
> isn't behaving so I'm not really expert on that.
> 
> Hope that helps,
> 
>    Tim
> 
> On 4/10/25 16:43, Peter Grandi wrote:
>>> I have a 4 nodes with 112 OSDs each [...]
>> As an aside I rekon that is not such a good idea as Ceph was
>> designed for one-small-OSD per small-server and lots of them,
>> but lots of people of course know better.
>> 
>>> Maybe you can gimme a hint how to struggle it over?
>> That is not so much a Ceph question but a distribution question
>> anyhow there are two possible hints that occur to me:
>> 
>> * In most distributions the automatic activation of block
>>   devices is done by the kernel plus 'udevd' rules and/or
>>   'systemd' units.
>> 
>> * There are timeouts for activation of storage devices and on a
>>   system with many, depending on type etc., there may be a
>>   default setting to activate them serially instead of in
>>   parallel to prevent sudden power consumption and other surges,
>>   so some devices may not activate because of timeouts.
>> 
>> You can start by asking the sysadmin for those machines to look
>> at system logs (distribution dependent) for storage device
>> activation reports to confirm whether the guesses above apply to
>> your situation and if confirmed you can ask them to change the
>> relevant settings for the distribution used.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

Reply via email to