[ceph-users] Re: About erasure code for larger hdd

Anthony D'Atri Mon, 09 Dec 2024 06:13:57 -0800

Agree, 3+2 is the widest you can safely go with 6x failure domains.

As Lukasz touches upon, ultra dense nodes are especially problematic when there 
are only a few of them.  You will want to attend to 
mon_osd_down_out_subtree_limit to prevent automated recovery when an entire 
node is unavailable, otherwise you’ll find your cluster refusing to backfill or 
even refusing writes.

HDDs are incompatible with consistent throughput.  Whatever you measure 
initially with an empty cluster, you will never see again, as the drives fill 
up and become increasingly fragmented.  You will spend a lot of time in 
rotational and especially seek latency.  There’s also a good chance your HBAs 
will be saturated, and your SATA interfaces absolutely will be.  It is not 
uncommon for HDD deployments to cap unit capacity at 8TB because of this.  
Figure at most 70 MB/s real world write throughput to a given HDD, and remember 
that each client write will have to touch 5x drives.  Recovery / backfill will 
measurably impact your client experience.

These are among the false economies of spinners.

> On Dec 9, 2024, at 8:25 AM, Lukasz Borek <[email protected]> wrote:
> 
> I'd start with 3+2, so you have one node left for recovery in case one
> fails. 6-node and 90 hdd per node sounds like a long recovery that needs to
> be tested for sure.
> 
> On Mon, 9 Dec 2024 at 06:10, Phong Tran Thanh <[email protected]>
> wrote:
> 
>> Hi community,
>> 
>> Please help with advice on selecting an erasure coding algorithm for a
>> 6-node cluster with 540 OSDs. What would be the appropriate values for *k*
>> and *m*? The cluster requires a high level of HA and consistent
>> throughput.
>> 
>> Email: [email protected]
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> 
> 
> 
> -- 
> Łukasz Borek
> [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: About erasure code for larger hdd

Reply via email to