[ceph-users] Re: ceph deployment best practice

Janne Johansson Tue, 22 Apr 2025 23:40:16 -0700

> Hi Janne,
>                      Thanks for your advice.
>
> So, you mean with with K=4 M =2 EC, we need 8 OSD nodes to have better 
> protection


As always, it is a tradeoff in cost, speed, availability, storage size
and so on.
What you need if the data is important is for the cluster to be able
to heal itself.

If you run X hosts in a Replication=X setup, OR, N+M hosts in an EC
N+M setup is something that works fine when everything is fine.
Unfortunately, drives die, OSes crash, hosts get failed PSUs and
sometimes, just a simple maintenance thing goes slightly wrong and
downtime becomes far longer than expected. In any such cases where one
host is out for a long time, a cluster that has exactly minimum number
of hosts will not be able to repair itself when a host is missing,
because it is already at the minimum and you are now one step under
the minimum.

This means that from being fully functional you are one crash away
from being degraded and at risk of data loss or at least a cluster
that goes readonly to protect the data in case of any new unexpected
surprises.

If you have N+M+1 hosts or more, the cluster can recover into one of
the "excess" hosts drives and at some point after, become fully
functional again without your intervention. Also one thing to consider
is that you should never fill a cluster to 85% or more and this will
need to take crashes into account aswell so if you have EC 4+2 and 7
hosts, if they are 83% full and one OSD host goes down, the cluster
will still not be able to make extra copies on the remaining 6 OSD
hosts so that it goes over 85% full, so not only should you have more
hosts than the EC N+M says, you should also have spare drive capacity
and expand early to avoid getting into a situation where you can't
repair due to almost-full drives everywhere. This is easy to see if
you compare the impact of losing one OSD host when you have 6, in this
case you have 16.6% of the total data needing to spread out over the
remaining 5 hosts, which will be a noticeable amount. If you had a 100
OSD hosts and one crashes, you have 1% of the total data needing to be
spread out over the remaining 99, and even if that is the same amount
of space to be rewritten/recreated, the extra data for each hosts
becomes very small. You can skip going to the datacenter, let it
recover by itself, and if another host dies the week after, its still
~1.1% to be spread out over 98 hosts, again something that is very
much manageable without panicking. If you have EC4+2 on 6 hosts and
one dies in the middle of the night, it's time to get in the car as
soon as possible.

>> Still, if you have EC 4+2 and only 6 OSD hosts, this means if a host
>> dies, the cluster can not recreate data anywhere without violating
>> "one copy per host" default placement, so the cluster will be degraded
>> until this host comes back or another one replaces it. For a N+M EC
>> cluster, I would suggest having N+M+1 or even +2 number of hosts, so
>> that you can do maintenance on a host or lose a host and still be able
>> to recover without visiting the server room.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph deployment best practice

Reply via email to