[ceph-users] Re: Looking for recommendations on Ceph node specifications

Tony Liu via ceph-users Tue, 26 May 2026 10:20:11 -0700

It depends on how RGW pools are configured.
https://docs.ceph.com/en/squid/radosgw/pools/


There are two layers, OSD and pool.
OSD can be 1) pure HDD, 2) pure SSD, 3) HDD with db on SSD.
By setting crush rule, you may choose which OSD is for which pool,
like #1 for rgw.data and #2 for rgw.index.

Tony
________________________________________
From: Adam Prycki via ceph-users <[email protected]>
Sent: May 26, 2026 08:57 AM
To: Tony Liu; Anthony D'Atri
Cc: [email protected]
Subject: [ceph-users] Re: Looking for recommendations on Ceph node 
specifications

In the past we have deployed HDD OSDs (without SSD block.db) and SSD
OSDs for S3 cluster. It worked fine.

 From our understanding, on a ceph consisting of HDDs with SSD block.db
most of RGW meta (index shards) will end up on SSD block.db.
So, if we can afford it, HDD+block.db setup for S3 would be better for
general performance and recovery speeds.

Is this assumption correct?

Best regards
Adam Prycki

On 19/05/2026 18:44, Tony Liu wrote:
> For RBD on HDD, with DB on SSD (typically shared by multiple HDDs), getting 
> better write performance,
> but SSD will be worn out fast. For S3, index on SSD, data and DB on HDD, it 
> will work fine. For density,
> it will be fine normally, but in case of failure or maintenance, recovery on 
> dense HDD will be very slow.
> For EC, 8+3 is fine on HDD without performance requirement. We tried 6+3 on 
> NVMe, CPU usage is very
> high which affects performance. For networking, 2x50G or 2x100G is too much 
> for HDD. 2x25G is sufficient.
>
>
> Tony
> ________________________________________
> From: Anthony D'Atri via ceph-users <[email protected]>
> Sent: May 19, 2026 08:45 AM
> To: Adam Prycki
> Cc: [email protected]
> Subject: [ceph-users] Re: Looking for recommendations on Ceph node 
> specifications
>
>>> .
>>
>> At least 16 or 32 nodes
>> in 16 racks.
>> Erasure coding 8+3 with failure domain on rack level.
>>
> Ack.
>
>
>> We initially selected 8+3 over 4+2 because we expect rebuilds to take very 
>> long with nodes this big and we don't want to loose redundancy
>
> Fair enough.  You get more nines with m=3 for sure, though the wider profile 
> itself will mean slower scrubs and recovery.  I suspect you set 
> mon_osd_down_out_subtree_limit?
>
>>
>>>>
>>>>
>>>> Splitting JBOD logically into 2 servers isn't an issue for use because we 
>>>> will replicate data on rack level and not host level.
>>>>
>>>>
>>>> Common specifications for all variants
>>>>
>>>> 5-6GB of RAM per 1 HDD
>>> Plus more for mons and other daemons? Especially MDS?
>>
>> Other daemons will be on some dedicated non-storage servers.
>
> Ack.  Had to ask.
>
>> We aim for low RAM/HDD on storage nodes. Other daemons won't fit there.
>>
>>>> 2% of HDD capacity in NVMe devices for block.db (or none)
>>>> 2x 50Gb or 2x 100Gb Ethernet per server (active-backup bonded interfaces)
>>>> (CPU per OSD to be determined)
>>>>
>>>>
>>>> Variant A1 is very unlikely to happen but we are curious what network 
>>>> interface speeds would you suggest for so many HDDs in one node.
>>> 100GE bonded at the least.  Depends on your workload.
>>>>
>>>> Variant A2 is the most likely the one we will choose for large deployment.
>>>>
>>>> Variant B1/B2 for smaller deployments.
>>>>
>>>> Does anyone of you run ceph on similar setups? Did you find any pitfall 
>>>> with it?
>>>>
>>>> What are your minimal recommendations for network speed per HDD, cpu per 
>>>> HDD, etc?
>>>>
>>>> In our experience most of our servers, even in large clusters, never max 
>>>> out the network interfaces or CPUs. We almost never rebuild or rebalance 
>>>> whole servers. 27 HDD nodes of our biggest CephFS cluster with EC usually 
>>>> have only 2-3Gbps of network traffic.
>>> Your workload is archival?
>>
>> Yes, mostly archival.
>> We have big demand for S3 and CephFS.
>> But we may move to pure s3 cluster in the future.
>>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Looking for recommendations on Ceph node specifications

Reply via email to