Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

Caspar Smit Fri, 20 Jul 2018 05:15:55 -0700

Ziggy,

For EC pools: min_size = k+1


So in your case (m=1) -> min_size is 3  which is the same as the number of
shards. So if ANY shard goes down, IO is freezed.

If you choose m=2 min_size will still be 3 but you now have 4 shards (k+m =
4) so you can loose a shard and still remain availability.

Of course a failure domain of 'host' is required to do this but since you
have 6 hosts that would be ok.

Met vriendelijke groet,

Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend

t: (+31) 299 410 414
e: caspars...@supernas.eu
w: www.supernas.eu

2018-07-20 14:02 GMT+02:00 Ziggy Maes <ziggy.m...@be-mobile.com>:

> Caspar,
>
>
>
> Thank you for your reply. I’m in all honesty still not clear on what value
> to use for min_size. From what I understand, it should be be set to the sum
> of k+m for erasure coded pools, as it is set by default.
>
>
>
> Additionally, could you elaborate why m=2 would be able to sustain a node
> failure? As stated, we have 6 hosts containing 4 OSDs (so 24) total. What
> would m=2 achieve that m=1 would not?
>
>
>
> Kind regards
>
>
> *Ziggy Maes *DevOps Engineer
> CELL +32 478 644 354
> SKYPE Ziggy.Maes
>
> [image: http://www.be-mobile.com/mail/bemobile_email.png]
> <http://www.be-mobile.com/>
>
> *www.be-mobile.com <http://www.be-mobile.com>*
>
>
>
>
>
> *From: *Caspar Smit <caspars...@supernas.eu>
> *Date: *Friday, 20 July 2018 at 13:36
> *To: *Ziggy Maes <ziggy.m...@be-mobile.com>
> *Cc: *"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
> *Subject: *Re: [ceph-users] Default erasure code profile and sustaining
> loss of one host containing 4 OSDs
>
>
>
> Ziggy,
>
>
>
> The default min_size for your pool is 3 so losing ANY single OSD (not even
> host) will result in reduced data availability:
>
> https://patchwork.kernel.org/patch/8546771/
>
> Use m=2 to be able to handle a node failure.
>
>
>
>
> Met vriendelijke groet,
>
> Caspar Smit
> Systemengineer
> SuperNAS
> Dorsvlegelstraat 13
> <https://maps.google.com/?q=Dorsvlegelstraat+13+%0D%0A1445+PA+Purmerend&entry=gmail&source=g>
> 1445 PA Purmerend
>
> t: (+31) 299 410 414
> e: caspars...@supernas.eu
> w: www.supernas.eu
>
>
>
> 2018-07-20 13:11 GMT+02:00 Ziggy Maes <ziggy.m...@be-mobile.com>:
>
> Hello
>
>
>
> I am currently trying to find out if Ceph can sustain the loss of a full
> host (containing 4 OSDs) in a default erasure coded pool (k=2, m=1). We
> have currently have a production EC pool with the default erasure profile,
> but would like to make sure the data on this pool remains accessible even
> after one of our hosts fail. Since we have a very small cluster (6 hosts, 4
> OSDs per host), I created a custom CRUSH rule to make sure the 3 chunks are
> spread over 3 hosts, screenshot here: https://gyazo.com/
> 1a3ddd6895df0d5e0e425774d2bcb257 .
>
>
>
> Unfortunately, taking one node offline results  in reduced data
> availability and incomplete PGs, as shown here: https://gyazo.com/
> db56d5a52c9de2fd71bf9ae8eb03dbbc .
>
>
>
> My question summed up: is it possible to sustain the loss of a host
> containing 4 OSDs using a k=2, m=1 erasure profile using a CRUSH map that
> spreads data over at least 3 hosts? If so, what am I doing wrong? I realize
> the documentation states that m equals the amount of OSDs that can be lost,
> but assuming a balanced CRUSH map is used I fail to see how this is
> required.
>
>
>
> Many thanks in advance.
>
>
>
> Kind regards
>
>
> *Ziggy Maes *DevOps Engineer
>
> [image: http://www.be-mobile.com/mail/bemobile_email.png]
> <http://www.be-mobile.com/>
>
> *www.be-mobile.com <http://www.be-mobile.com>*
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

Reply via email to