Re: [ceph-users] Erasure Coding failure domain (again)

Hector Martin Wed, 10 Apr 2019 03:24:34 -0700

On 10/04/2019 18.11, Christian Balzer wrote:
> Another thing that crossed my mind aside from failure probabilities caused
> by actual HDDs dying is of course the little detail that most Ceph
> installations will have have WAL/DB (journal) on SSDs, the most typical
> ratio being 1:4. 
> And given the current thread about compaction killing pure HDD OSDs,
> something you may _have_ to do.
> 
> So if you get unlucky and a SSD dies 4 OSDs are irrecoverably lost, unlike
> a dead node that can be recovered.
> Combine that with the background noise of HDDs failing, things got just
> quite a bit scarier.

Certainly, your failure domain should be at least host, and that changes
the math (even without considering whole-host failure).

Let's say you have 375 hosts and 4 OSDs per host, with the failure
domain correctly set to host. Same 50000 pool PGs as before. Now if 3
hosts die:

50000 / (375 choose 3) =~ 0.57% chance of data loss

This is equivalent to having 3 shared SSDs die.

If 3 random OSDs die in different hosts, the chances of data loss would
be 0.57% / (4^3) =~ 0.00896 % (1 in 4 chance per host that you hit the
OSD a PG actually lives in, and you need to hit all 3). This is
marginally higher than the ~ 0.00891% with uniformly distributed PGs,
because you've eliminated all sets of OSDs which share a host.


-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coding failure domain (again)

Reply via email to