[ceph-users] Re: Looking for recommendations on Ceph node specifications

Adam Prycki via ceph-users Tue, 26 May 2026 08:56:41 -0700

In the past we have deployed HDD OSDs (without SSD block.db) and SSDOSDs for S3 cluster. It worked fine.

From our understanding, on a ceph consisting of HDDs with SSD block.dbmost of RGW meta (index shards) will end up on SSD block.db.So, if we can afford it, HDD+block.db setup for S3 would be better forgeneral performance and recovery speeds.


Is this assumption correct?

Best regards
Adam Prycki

On 19/05/2026 18:44, Tony Liu wrote:

For RBD on HDD, with DB on SSD (typically shared by multiple HDDs), getting 
better write performance,
but SSD will be worn out fast. For S3, index on SSD, data and DB on HDD, it 
will work fine. For density,
it will be fine normally, but in case of failure or maintenance, recovery on 
dense HDD will be very slow.
For EC, 8+3 is fine on HDD without performance requirement. We tried 6+3 on 
NVMe, CPU usage is very
high which affects performance. For networking, 2x50G or 2x100G is too much for 
HDD. 2x25G is sufficient.


Tony
________________________________________
From: Anthony D'Atri via ceph-users <[email protected]>
Sent: May 19, 2026 08:45 AM
To: Adam Prycki
Cc: [email protected]
Subject: [ceph-users] Re: Looking for recommendations on Ceph node 
specifications


At least 16 or 32 nodes
in 16 racks.
Erasure coding 8+3 with failure domain on rack level.

Ack.

We initially selected 8+3 over 4+2 because we expect rebuilds to take very long 
with nodes this big and we don't want to loose redundancy


Fair enough.  You get more nines with m=3 for sure, though the wider profile 
itself will mean slower scrubs and recovery.  I suspect you set 
mon_osd_down_out_subtree_limit?



Splitting JBOD logically into 2 servers isn't an issue for use because we will 
replicate data on rack level and not host level.


Common specifications for all variants

5-6GB of RAM per 1 HDD

Plus more for mons and other daemons? Especially MDS?


Other daemons will be on some dedicated non-storage servers.


Ack.  Had to ask.

We aim for low RAM/HDD on storage nodes. Other daemons won't fit there.

2% of HDD capacity in NVMe devices for block.db (or none)
2x 50Gb or 2x 100Gb Ethernet per server (active-backup bonded interfaces)
(CPU per OSD to be determined)


Variant A1 is very unlikely to happen but we are curious what network interface 
speeds would you suggest for so many HDDs in one node.

100GE bonded at the least.  Depends on your workload.


Variant A2 is the most likely the one we will choose for large deployment.

Variant B1/B2 for smaller deployments.

Does anyone of you run ceph on similar setups? Did you find any pitfall with it?

What are your minimal recommendations for network speed per HDD, cpu per HDD, 
etc?

In our experience most of our servers, even in large clusters, never max out 
the network interfaces or CPUs. We almost never rebuild or rebalance whole 
servers. 27 HDD nodes of our biggest CephFS cluster with EC usually have only 
2-3Gbps of network traffic.

Your workload is archival?


Yes, mostly archival.
We have big demand for S3 and CephFS.
But we may move to pure s3 cluster in the future.

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Looking for recommendations on Ceph node specifications

Reply via email to