Ziggy, For EC pools: min_size = k+1
So in your case (m=1) -> min_size is 3 which is the same as the number of shards. So if ANY shard goes down, IO is freezed. If you choose m=2 min_size will still be 3 but you now have 4 shards (k+m = 4) so you can loose a shard and still remain availability. Of course a failure domain of 'host' is required to do this but since you have 6 hosts that would be ok. Met vriendelijke groet, Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t: (+31) 299 410 414 e: caspars...@supernas.eu w: www.supernas.eu 2018-07-20 14:02 GMT+02:00 Ziggy Maes <ziggy.m...@be-mobile.com>: > Caspar, > > > > Thank you for your reply. I’m in all honesty still not clear on what value > to use for min_size. From what I understand, it should be be set to the sum > of k+m for erasure coded pools, as it is set by default. > > > > Additionally, could you elaborate why m=2 would be able to sustain a node > failure? As stated, we have 6 hosts containing 4 OSDs (so 24) total. What > would m=2 achieve that m=1 would not? > > > > Kind regards > > > *Ziggy Maes *DevOps Engineer > CELL +32 478 644 354 > SKYPE Ziggy.Maes > > [image: http://www.be-mobile.com/mail/bemobile_email.png] > <http://www.be-mobile.com/> > > *www.be-mobile.com <http://www.be-mobile.com>* > > > > > > *From: *Caspar Smit <caspars...@supernas.eu> > *Date: *Friday, 20 July 2018 at 13:36 > *To: *Ziggy Maes <ziggy.m...@be-mobile.com> > *Cc: *"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> > *Subject: *Re: [ceph-users] Default erasure code profile and sustaining > loss of one host containing 4 OSDs > > > > Ziggy, > > > > The default min_size for your pool is 3 so losing ANY single OSD (not even > host) will result in reduced data availability: > > https://patchwork.kernel.org/patch/8546771/ > > Use m=2 to be able to handle a node failure. > > > > > Met vriendelijke groet, > > Caspar Smit > Systemengineer > SuperNAS > Dorsvlegelstraat 13 > <https://maps.google.com/?q=Dorsvlegelstraat+13+%0D%0A1445+PA+Purmerend&entry=gmail&source=g> > 1445 PA Purmerend > > t: (+31) 299 410 414 > e: caspars...@supernas.eu > w: www.supernas.eu > > > > 2018-07-20 13:11 GMT+02:00 Ziggy Maes <ziggy.m...@be-mobile.com>: > > Hello > > > > I am currently trying to find out if Ceph can sustain the loss of a full > host (containing 4 OSDs) in a default erasure coded pool (k=2, m=1). We > have currently have a production EC pool with the default erasure profile, > but would like to make sure the data on this pool remains accessible even > after one of our hosts fail. Since we have a very small cluster (6 hosts, 4 > OSDs per host), I created a custom CRUSH rule to make sure the 3 chunks are > spread over 3 hosts, screenshot here: https://gyazo.com/ > 1a3ddd6895df0d5e0e425774d2bcb257 . > > > > Unfortunately, taking one node offline results in reduced data > availability and incomplete PGs, as shown here: https://gyazo.com/ > db56d5a52c9de2fd71bf9ae8eb03dbbc . > > > > My question summed up: is it possible to sustain the loss of a host > containing 4 OSDs using a k=2, m=1 erasure profile using a CRUSH map that > spreads data over at least 3 hosts? If so, what am I doing wrong? I realize > the documentation states that m equals the amount of OSDs that can be lost, > but assuming a balanced CRUSH map is used I fail to see how this is > required. > > > > Many thanks in advance. > > > > Kind regards > > > *Ziggy Maes *DevOps Engineer > > [image: http://www.be-mobile.com/mail/bemobile_email.png] > <http://www.be-mobile.com/> > > *www.be-mobile.com <http://www.be-mobile.com>* > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com