On Thu, Aug 23, 2018 at 10:21 AM Cody <codeology....@gmail.com> wrote:
> So, is it okay to say that compared to the 'firstn' mode, the 'indep' > mode may have the least impact on a cluster in an event of OSD > failure? Could I use 'indep' for replica pool as well? > You could, but shouldn't. Imagine if the primary OSD fails and you're using indep: then the new primary won't know anything at all about the PG, so it's just going to have to set a pgtemp mapping that gives it back to one of the old nodes anyway! In the EC case that happens too, but it's unavoidable: all nodes have individual data stored, so on the loss of a primary you're going to need a few more round-trips anyway (and in fact EC pools regularly have a primary which isn't the first in the list, unlike replicated ones). -Greg > > Thank you! > > Regards, > Cody > On Wed, Aug 22, 2018 at 7:12 PM Gregory Farnum <gfar...@redhat.com> wrote: > > > > On Wed, Aug 22, 2018 at 12:56 AM Konstantin Shalygin <k0...@k0ste.ru> > wrote: > >> > >> > Hi everyone, > >> > > >> > I read an earlier thread [1] that made a good explanation on the 'step > >> > choose|chooseleaf' option. Could someone further help me to understand > >> > the 'firstn|indep' part? Also, what is the relationship between 'step > >> > take' and 'step choose|chooseleaf' when it comes to define a failure > >> > domain? > >> > > >> > Thank you very much. > >> > >> > >> This documented on CRUSH Map Rules [1] > >> > >> > >> [1] > >> > http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-map-rules > >> > > > > But that doesn't seem to really discuss it, and I don't see it elsewhere > in our docs either. So: > > > > "indep" and "firstn" are two different strategies for selecting items > (mostly, OSDs) in a CRUSH hierarchy. If you're storing EC data you want to > use indep; if you're storing replicated data you want to use firstn. > > > > The reason has to do with how they behave when a previously-selected > devices fails. Let's say you have a PG stored on OSDs 1, 2, 3, 4, 5. Then 3 > goes down. > > With the "firstn" mode, CRUSH simply adjusts its calculation in a way > that it selects 1 and 2, then selects 3 but discovers it's down, so it > retries and selects 4 and 5, and then goes on to select a new OSD 6. So the > final CRUSH mapping change is > > 1, 2, 3, 4, 5 -> 1, 2, 4, 5, 6. > > > > But if you're storing an EC pool, that means you just changed the data > mapped to OSDs 4, 5, and 6! That's terrible! So the "indep" mode attempts > to not do that. (It still *might* conflict, but the odds are much lower). > You can instead expect it, when it selects the failed 3, to try again and > pick out 6, for a final transformation of: > > 1, 2, 3, 4, 5 -> 1, 2, 6, 4, 5 > > -Greg > > > >> > >> > >> > >> k > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com