[ceph-users] Re: ceph deployment best practice

gagan tiwari Fri, 11 Apr 2025 10:14:57 -0700

Hi Anthony,
                         We will be using Samsung SSD 870 QVO 8TB disks on
all OSD servers.


One more thing , I want to know is that CephFS supports  mounting with
FsCache on clients ?  500T data stored in the cluster will be accessed by
the jobs running on the clients nodes and we need super fast read
performance. For that we do have additional cache disk installed on all the
clients nodes. And the way NFS V4 supports mount NFS share with FsCache on
clients' hosts ,CephFS also supports that.

On those  4x non-OSD nodes, I will probably run ldap and HTCondor service.
But mds node will not be used for anything other than mds daemon.

Thanks,
Gagan



On Fri, Apr 11, 2025 at 8:45 PM Anthony D'Atri <anthony.da...@gmail.com>
wrote:

>
>
> > On Apr 11, 2025, at 4:04 AM, gagan tiwari <
> gagan.tiw...@mathisys-india.com> wrote:
> >
> > Hi Anthony,
> >                       Thanks for the reply!
> >
> > We will be using  CephFS  to access  Ceph Storage from clients.  So, this
> > will need MDS daemon also.
>
> MDS is single-threaded, so unlike most Ceph daemons it benefits more from
> a high-frequency CPU than core count.
>
> > So, based on your advice, I am thinking of having 4 Dell PowerEdge
> servers
> > . 3 of them will run 3 Monitor daemons and one of them  will run MDS
> > daemon.
> >
> > These Dell Servers will have following hardware :-
> >
> > 1. 4 cores (  8 threads )  ( Can go for 8 core and 16 threads )
> >
> > 2.  64G RAM
> >
> > 3. 2x4T  Samsung SSD  with RA!D 1 to install OS and run monitor and
> > metadata services.
>
> That probably suffices for a small cluster.  Are those Samsungs
> enterprise?
>
>
> > OSD nodes will be upgraded to have 32 cores ( 64 threads ).  Disk and RAM
> > will remain same ( 128G and 22X8T Samsung SSD )
>
> Which Samsung SSD?  Using client SKUs for OSDs has a way of leading to
> heartbreak.
>
> 64 threads would be better for a 22x OSD node, though still a bit light.
> Are these SATA or NVMe?
>
> > Actually , I want to use OSD nodes to run OSD damons and not any
> > other demons and which is why I am thinking of having 4 additional Dell
> > servers as mentioned above.
>
> Colocation of daemons is common these days, especially with smaller
> clusters.
>
> >
> > Please advise if this plan will be better.
>
> That’ll work, but unless you already have those quite-modest 4x non-OSD
> nodes sitting around idle you might consider just going with the OSD nodes
> and bumping the CPU again so you can colocate all the daemons.
>
> >
> > Thanks,
> > Gagan
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 9, 2025 at 8:12 PM Anthony D'Atri <anthony.da...@gmail.com>
> > wrote:
> >
> >>
> >>>
> >>> We would start deploying Ceph with 4 hosts ( HP Proliant servers ) each
> >>> running RockyLinux 9.
> >>>
> >>> One of the hosts called ceph-adm will be smaller one and will have
> >>> following hardware :-
> >>>
> >>> 2x4T SSD  with raid 1 to install OS on.
> >>>
> >>> 8 Core with 3600MHz freq.
> >>>
> >>> 64G  RAM
> >>>
> >>> We are planning to run all Ceph daemons except OSD daemon like monitor
> ,
> >>> metadata ,etc on this host.
> >>
> >> 8 core == 16 threads? Are you provisioning this node because you have it
> >> laying around idle?
> >>
> >> Note that you will want *at least* 3 Monitor (monitors) daemons, which
> >> must be on different nodes.  5 is better, but at least 3. You’ll also
> have
> >> Grafana, Prometheus, MDS (if you’re going to CephFS vs using S3 object
> >> storage or RBD block)
> >>
> >> 8c is likely on the light side for all of that.  You would also benefit
> >> from not having that node be a single point of failure.  I would
> suggest if
> >> you can raising this node to the spec of the planned 3x OSD nodes so you
> >> have 4x equivalent nodes, and spread that non-OSD daemons across them.
> >>
> >> Note also that your OSD nodes will also have node_exporter, crash, and
> >> other boilerplate daemons.
> >>
> >>
> >>> We will have 3 hosts to run OSD which will store actual data.
> >>>
> >>> Each OSD host will have following hardware
> >>>
> >>> 2x4T SSD  with raid 1 to install OS on.
> >>>
> >>> 22X8T SSD  to store data ( OSDs ) ( without partition ). We will use
> >> entire
> >>> disk without partitions
> >>
> >> SAS, SATA, or NVMe SSDs?  Which specific model?  You really want to
> avoid
> >> client (desktop) models for Ceph, but you likely do not need to pay for
> >> higher endurance mixed-use SKUs.
> >>
> >>> Each OSD host will have 128G RAM  ( No swap space )
> >>
> >> Thank you for skipping swap.  Some people are really stuck in the past
> in
> >> that regard.
> >>
> >>> Each OSD host will have 16 cores.
> >>
> >> So 32 threads total?  That is very light for 22 OSDs + other daemons.
> For
> >> HDD OSDs a common rule of thumb is at minimum 2x threads per, for
> SAS/SATA
> >> SSDs, 4, for NVMe SSDs 6.  Plus margin for the OS and other processes.
> >>
> >>> All 4 hosts will connect to each via 10G nic.
> >>
> >> Two ports with bonding? Redundant switches?
> >>
> >>> The 500T data
> >>
> >> The specs you list above include 528 TB of *raw* space.  Be advised that
> >> with three OSD nodes, you will necessarily be doing replication.  For
> >> safety replication with size=3.  Taking into consideration TB vs TiB and
> >> headroom, you’re looking at 133TiB of usable space.  You could go with
> >> size=2 to get 300TB of usable space, but at increased risk of data
> >> unavailability or loss when drives/hosts fail or reboot.
> >>
> >> With at least 4 OSD nodes - even if they aren’t fully populated with
> >> capacity drives — you could do EC for a more favorable raw:usable
> ratio, at
> >> the expense of slower writes and recovery.  With 4 nodes you could in
> >> theory do 2,2 EC for 200 TiB of usable space, with 5 you could do 3,2
> for
> >> 240 TiB usable, etc.
> >>
> >>> will be accessed by the clients. We need to have
> >>> read performance as fast as possible.
> >>
> >> Hope your SSDs are enterprise NVMe.
> >>
> >>> We can't afford data loss and downtime.
> >>
> >> Then no size=2 for you.
> >>
> >>> So, we want to have a Ceph
> >>> deployment  which serves our purpose.
> >>>
> >>> So, please advise me if the plan that I have designed will serve our
> >>> purpose.
> >>> Or is there a better way , please advise that.
> >>>
> >>> Thanks,
> >>> Gagan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> We have a HP storage server with 12 SDD of 5T each and have set-up
> >> hardware
> >>> RAID6 on these disks.
> >>>
> >>> HP storage server has 64G RAM and 18 cores.
> >>>
> >>> So, please advise how I should go about setting up Ceph on it to have
> >> best
> >>> read performance. We need fastest read performance.
> >>>
> >>>
> >>> Thanks,
> >>> Gagan
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph deployment best practice

Reply via email to