[ceph-users] Re: ceph deployment best practice

Anthony D'Atri Fri, 11 Apr 2025 08:16:25 -0700


> On Apr 11, 2025, at 4:04 AM, gagan tiwari <gagan.tiw...@mathisys-india.com> 
> wrote:
> 
> Hi Anthony,
>                       Thanks for the reply!
> 
> We will be using  CephFS  to access  Ceph Storage from clients.  So, this
> will need MDS daemon also.


MDS is single-threaded, so unlike most Ceph daemons it benefits more from a 
high-frequency CPU than core count.

> So, based on your advice, I am thinking of having 4 Dell PowerEdge servers
> . 3 of them will run 3 Monitor daemons and one of them  will run MDS
> daemon.
> 
> These Dell Servers will have following hardware :-
> 
> 1. 4 cores (  8 threads )  ( Can go for 8 core and 16 threads )
> 
> 2.  64G RAM
> 
> 3. 2x4T  Samsung SSD  with RA!D 1 to install OS and run monitor and
> metadata services.

That probably suffices for a small cluster.  Are those Samsungs enterprise? 


> OSD nodes will be upgraded to have 32 cores ( 64 threads ).  Disk and RAM
> will remain same ( 128G and 22X8T Samsung SSD )

Which Samsung SSD?  Using client SKUs for OSDs has a way of leading to 
heartbreak.

64 threads would be better for a 22x OSD node, though still a bit light.  Are 
these SATA or NVMe?

> Actually , I want to use OSD nodes to run OSD damons and not any
> other demons and which is why I am thinking of having 4 additional Dell
> servers as mentioned above.

Colocation of daemons is common these days, especially with smaller clusters.  

> 
> Please advise if this plan will be better.

That’ll work, but unless you already have those quite-modest 4x non-OSD nodes 
sitting around idle you might consider just going with the OSD nodes and 
bumping the CPU again so you can colocate all the daemons.

> 
> Thanks,
> Gagan
> 
> 
> 
> 
> 
> 
> On Wed, Apr 9, 2025 at 8:12 PM Anthony D'Atri <anthony.da...@gmail.com>
> wrote:
> 
>> 
>>> 
>>> We would start deploying Ceph with 4 hosts ( HP Proliant servers ) each
>>> running RockyLinux 9.
>>> 
>>> One of the hosts called ceph-adm will be smaller one and will have
>>> following hardware :-
>>> 
>>> 2x4T SSD  with raid 1 to install OS on.
>>> 
>>> 8 Core with 3600MHz freq.
>>> 
>>> 64G  RAM
>>> 
>>> We are planning to run all Ceph daemons except OSD daemon like monitor ,
>>> metadata ,etc on this host.
>> 
>> 8 core == 16 threads? Are you provisioning this node because you have it
>> laying around idle?
>> 
>> Note that you will want *at least* 3 Monitor (monitors) daemons, which
>> must be on different nodes.  5 is better, but at least 3. You’ll also have
>> Grafana, Prometheus, MDS (if you’re going to CephFS vs using S3 object
>> storage or RBD block)
>> 
>> 8c is likely on the light side for all of that.  You would also benefit
>> from not having that node be a single point of failure.  I would suggest if
>> you can raising this node to the spec of the planned 3x OSD nodes so you
>> have 4x equivalent nodes, and spread that non-OSD daemons across them.
>> 
>> Note also that your OSD nodes will also have node_exporter, crash, and
>> other boilerplate daemons.
>> 
>> 
>>> We will have 3 hosts to run OSD which will store actual data.
>>> 
>>> Each OSD host will have following hardware
>>> 
>>> 2x4T SSD  with raid 1 to install OS on.
>>> 
>>> 22X8T SSD  to store data ( OSDs ) ( without partition ). We will use
>> entire
>>> disk without partitions
>> 
>> SAS, SATA, or NVMe SSDs?  Which specific model?  You really want to avoid
>> client (desktop) models for Ceph, but you likely do not need to pay for
>> higher endurance mixed-use SKUs.
>> 
>>> Each OSD host will have 128G RAM  ( No swap space )
>> 
>> Thank you for skipping swap.  Some people are really stuck in the past in
>> that regard.
>> 
>>> Each OSD host will have 16 cores.
>> 
>> So 32 threads total?  That is very light for 22 OSDs + other daemons.  For
>> HDD OSDs a common rule of thumb is at minimum 2x threads per, for SAS/SATA
>> SSDs, 4, for NVMe SSDs 6.  Plus margin for the OS and other processes.
>> 
>>> All 4 hosts will connect to each via 10G nic.
>> 
>> Two ports with bonding? Redundant switches?
>> 
>>> The 500T data
>> 
>> The specs you list above include 528 TB of *raw* space.  Be advised that
>> with three OSD nodes, you will necessarily be doing replication.  For
>> safety replication with size=3.  Taking into consideration TB vs TiB and
>> headroom, you’re looking at 133TiB of usable space.  You could go with
>> size=2 to get 300TB of usable space, but at increased risk of data
>> unavailability or loss when drives/hosts fail or reboot.
>> 
>> With at least 4 OSD nodes - even if they aren’t fully populated with
>> capacity drives — you could do EC for a more favorable raw:usable ratio, at
>> the expense of slower writes and recovery.  With 4 nodes you could in
>> theory do 2,2 EC for 200 TiB of usable space, with 5 you could do 3,2 for
>> 240 TiB usable, etc.
>> 
>>> will be accessed by the clients. We need to have
>>> read performance as fast as possible.
>> 
>> Hope your SSDs are enterprise NVMe.
>> 
>>> We can't afford data loss and downtime.
>> 
>> Then no size=2 for you.
>> 
>>> So, we want to have a Ceph
>>> deployment  which serves our purpose.
>>> 
>>> So, please advise me if the plan that I have designed will serve our
>>> purpose.
>>> Or is there a better way , please advise that.
>>> 
>>> Thanks,
>>> Gagan
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> We have a HP storage server with 12 SDD of 5T each and have set-up
>> hardware
>>> RAID6 on these disks.
>>> 
>>> HP storage server has 64G RAM and 18 cores.
>>> 
>>> So, please advise how I should go about setting up Ceph on it to have
>> best
>>> read performance. We need fastest read performance.
>>> 
>>> 
>>> Thanks,
>>> Gagan
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph deployment best practice

Reply via email to