Common motivations for this strategy include the lure of unit economics and 
RUs. 

Often ultra dense servers can’t fill racks anyway due to power and weight 
limits. 

Here the osd_memory_target would have to be severely reduced to avoid 
oomkilling.  Assuming the OSDs are top load LFF HDDs with expanders, the HBA 
will be a bottleck as well.  I’ve suffered similar systems for RGW.  All the 
clever juggling in the world could not override the math, and the solution was 
QLC. 

“We can lose 4 servers”

Do you realize that your data would then be unavailable ?  When you lose even 
one, you will not be able to restore redundancy and your OSDs likely will 
oomkill.  

If you’re running CephFS, how are you provisioning fast OSDs for the metadata 
pool?  Are the CPUs high-clock for MDS responsiveness? 

Even given the caveats this seems like a recipe for at best disappointment.  

At the very least add RAM.  8GB per OSD plus ample for other daemons.  Better 
would be 3x normal additional hosts for the others. 

> On Nov 17, 2023, at 8:33 PM, Simon Kepp <si...@kepp.tech> wrote:
> 
> I know that your question is regarding the service servers, but may I ask,
> why you are planning to place so many OSDs ( 300) on so few OSD hosts( 6)
> (= 50 OSDs per node)?
> This is possible to do, but sounds like the nodes were designed for
> scale-up rather than a scale-out architecture like ceph. Going with such
> "fat nodes" is doable, but will significantly limit performance,
> reliability and availability, compared to distributing the same OSDs
> on more thinner nodes.
> 
> Best regards,
> Simon Kepp
> 
> Founder/CEO
> Kepp Technologies
> 
>> On Fri, Nov 17, 2023 at 10:59 AM Albert Shih <albert.s...@obspm.fr> wrote:
>> 
>> Hi everyone,
>> 
>> In the purpose to deploy a medium size of ceph cluster (300 OSD) we have 6
>> bare-metal server for the OSD, and 5 bare-metal server for the service
>> (MDS, Mon, etc.)
>> 
>> Those 5 bare-metal server have each 48 cores and 256 Gb.
>> 
>> What would be the smartest way to use those 5 server, I see two way :
>> 
>>  first :
>> 
>>    Server 1 : MDS,MON, grafana, prometheus, webui
>>    Server 2:  MON
>>    Server 3:  MON
>>    Server 4 : MDS
>>    Server 5 : MDS
>> 
>>  so 3 MDS, 3 MON. and we can loose 2 servers.
>> 
>>  Second
>> 
>>    KVM on each server
>>      Server 1 : 3 VM : One for grafana & CIe, and 1 MDS, 2 MON
>>      other server : 1 MDS, 1 MON
>> 
>>  in total :  5 MDS, 5 MON and we can loose 4 servers.
>> 
>> So on paper it's seem the second are smarter, but it's also more complex,
>> so my question are «is it worth the complexity to have 5 MDS/MON for 300
>> OSD».
>> 
>> Important : The main goal of this ceph cluster are not to get the maximum
>> I/O speed, I would not say the speed is not a factor, but it's not the main
>> point.
>> 
>> Regards.
>> 
>> 
>> --
>> Albert SHIH 🦫 🐸
>> Observatoire de Paris
>> France
>> Heure locale/Local time:
>> ven. 17 nov. 2023 10:49:27 CET
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to