[ceph-users] Re: [Urgent suggestion needed] New Prod Cluster Hardware recommendation

Pripriya Wed, 09 Jul 2025 23:24:30 -0700

Hello Anthony D'Atri
<https://lists.ceph.io/hyperkitty/users/8185fca310134bbc9ca3fef8ca01866d/>
Thanks for your detailed Reply.


1 - Yes we have proxmox nodes but like you said they don't use any shared
cluster right now and are running as single node proxmox.
Now I have a task to move them into a clustered proxmox and use shared
storage.
Deploying external ceph to avoid dependency on single node in
hyperconverged setup, also we wanted to use ceph for other services apart
from ceph so we think its better to go with external ceph.

Sorry I think I didn't explain it correctly about my ceph cluster.

*Total 5 Nodes: *where I want to colocate services
3 Nodes will have MON,MGR & OSD Services colocated
2 Node will be primary for OSD service but if needed can expand it for
other services as we are planning to go with similar specs hardware for all
the nodes.

What do you think we should do with core and ram reservation for per osd's
and per services where we want to populate osd's upto full capacity in
longer run (24 NVME chassis)?

Sure, I will explore Dell R7615 with 9454 or better 32 cores (AMD EPYC 9334
2.70GHz, 32C/64T) because of cost

*RAM per node:*
We are going with 32GB DIMMs  which will allow more capacity increase in
future (for now 32*4=128G)

*OSD's per node:*
5x 7.68TB Data Center NVMe Read Intensive AG Drive U2 with carrier

OR
10x3.84TB Data Center NVMe Read Intensive AG Drive U2 with Carrier

Which one is better ?

*Networking:*
Public Network: 2x25G port as Bond0
Cluster Network: 2x25G Port as Bond1
(Proxmox will also have 2x25G port in bond)

On Wed, Jul 9, 2025 at 8:26 PM Alex Gorbachev <a...@iss-integration.com>
wrote:

> Completely agreeing with what Anthony wrote, and we see very good results
> with at least 4 physical OSD nodes, managed and deployed by cephadm - you
> will have 3 MONs and MGRs "hyperconverged" in cephadm sense, and run 3x
> replication for OSD with an extra OSD host for n+1 redundancy.
>
> Proxmox just needs a network and keyring to talk to this cluster.  You can
> run deployment and automation functions from a VM in Proxmox that runs on
> local storage.
>
> --
> Alex Gorbachev
> https://alextelescope.blogspot.com
>
>
>
> On Wed, Jul 9, 2025 at 10:28 AM Anthony D'Atri <a...@dreamsnake.net> wrote:
>
>>
>> >
>> > I am new to this thread would like to get some suggestions to build new
>> > external ceph  cluster
>>
>> Why external?  Many Proxmox deployments are converged.  Is this an
>> existing Proxmox cluster that currently does not use shared storage?
>>
>>
>> > which will backend for proxmox VM's
>> >
>> > I am planning to start with 5 Nodes(3 Mon & 2 OSD)
>>
>> This is not the best plan.
>>
>> If your data is not disposable you will want to maintain the default 3
>> copies, which you cannot safely do on 2 OSD nodes.
>>
>> When deploying a very small cluster solve first for the number of nodes.
>> You need at least 3 OSD nodes, 4 has advantages.
>>
>> So in your case, go converged: OSDs on all 5 nodes, and add the
>> mon/mgr/etc ceph orch labels to all 5 so that when a node is down a
>> replacement may be spun up.
>>
>> This would also let you deploy 5 mon instances instead of 3, which is
>> advantageous in that you can ride out 2 failures without disruption.
>>
>> > and I am expecting to start with ~60+ TB usable space.
>>
>> That would mean (3 * 60) / .85 =211.765 ~ 212 TB of raw capacity, let’s
>> see how that matches your numbers below.
>>
>> > estimated Storage Specs Calculator:
>> >
>> > RAM: 8GB/OSD Daemon, 16GB OS, 4GB for Mon & MGR, 16GB for MDS
>>
>> I would allot more than 4GB for mon/mgr.
>>
>> > cpu: 2 core/osd, 2 core for os, 2 core per services
>>
>> Cores or hyperthreads?  Either way these numbers are low.
>>
>> > *Dell R7625 5 Node to start with *
>>
>> Dramatic overkill for a mon/mgr/MDS node.
>>
>> > - RAM: 128G (Plan to increase later as needed)
>>
>> I suggest 32GB DIMMs to maximize potential for future expansion.
>>
>> > - CPU: 2x AMD EPYC 9224 2.50GHz, 24C/48T, 64M Cache (200W) DDR5-4800
>>
>> 96 threads total per server.
>>
>> > - Chassis Configuration 24x2.5 NVME
>>
>> You’ll be tempted to fill those slots; each OSD past, say, 12 will
>> decrease performance due to having to share the vcores/threads.
>> With the above CPU choice I would go with the R7615 to save rack space,
>> or bump up the CPU. The 9224 is the default choice on Dell’s configurator
>> but there are lots of others available. The 9454 for example would give you
>> enough cores to more comfortably service an eventual 24 OSDs.
>>
>> Alternately consider the R7615 with, say, the 9654P. The P CPUs can’t be
>> used in a dual-socket motherboard, so they’re usually a bit cheaper for the
>> same specs.
>>
>> With EPYC CPUs you can get better performance by disabling IOMMU on the
>> kernel command line via GRUB defaults.
>>
>>
>> > - 2x1.92TB Data Center NVMe Read Intensive AG Drive U2 Gen4 with
>> carrier (
>> > OS Disk, I need extra space)
>>
>> Okay so that will limit you to 22 OSDs with the 24-bay chassis.  You
>> could provision BOSS-N1 for M.2 boot though.
>>
>> > - 5x 7.68TB Data Center NVMe Read Intensive AG Drive U2 Gen4 with
>> Carrier
>> > 24Gbps 512e 2.5in Hot-Plug 1DWPD , AG Drive
>>
>> I think you have a copy/paste error there.  The second line above sounds
>> like a SAS SSD.
>>
>> So from what you wrote about this would intend a total of 10x 7.68TB OSD
>> drives.  With 3x replication and the default headroom ratios these will
>> give you about 22 TB of usable space, which is just 20 TiB.
>>
>> > - 2x Nvidia ConnectX-6 Lx Dual Port 10/25GbE SFP28, No Crypto, PCIe Low
>> > Profile
>>
>> I suggest bonding them and not having an optional replication network.
>> Some people will use one port for public and the other for replication, but
>> for multiple reasons that wouldn’t be ideal.
>>
>> >
>> > - 1G for IPMI
>> >
>> > Please help me finalize these specs.
>> >
>> > Thanks
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent suggestion needed] New Prod Cluster Hardware recommendation

Reply via email to