On Thu, Aug 23, 2018 at 10:21 AM Cody <codeology....@gmail.com> wrote:

> So, is it okay to say that compared to the 'firstn' mode, the 'indep'
> mode may have the least impact on a cluster in an event of OSD
> failure? Could I use 'indep' for replica pool as well?
>

You could, but shouldn't. Imagine if the primary OSD fails and you're using
indep: then the new primary won't know anything at all about the PG, so
it's just going to have to set a pgtemp mapping that gives it back to one
of the old nodes anyway!

In the EC case that happens too, but it's unavoidable: all nodes have
individual data stored, so on the loss of a primary you're going to need a
few more round-trips anyway (and in fact EC pools regularly have a primary
which isn't the first in the list, unlike replicated ones).
-Greg


>
> Thank you!
>
> Regards,
> Cody
> On Wed, Aug 22, 2018 at 7:12 PM Gregory Farnum <gfar...@redhat.com> wrote:
> >
> > On Wed, Aug 22, 2018 at 12:56 AM Konstantin Shalygin <k0...@k0ste.ru>
> wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > I read an earlier thread [1] that made a good explanation on the 'step
> >> > choose|chooseleaf' option. Could someone further help me to understand
> >> > the 'firstn|indep' part? Also, what is the relationship between 'step
> >> > take' and 'step choose|chooseleaf' when it comes to define a failure
> >> > domain?
> >> >
> >> > Thank you very much.
> >>
> >>
> >> This documented on CRUSH Map Rules [1]
> >>
> >>
> >> [1]
> >>
> http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-map-rules
> >>
> >
> > But that doesn't seem to really discuss it, and I don't see it elsewhere
> in our docs either. So:
> >
> > "indep" and "firstn" are two different strategies for selecting items
> (mostly, OSDs) in a CRUSH hierarchy. If you're storing EC data you want to
> use indep; if you're storing replicated data you want to use firstn.
> >
> > The reason has to do with how they behave when a previously-selected
> devices fails. Let's say you have a PG stored on OSDs 1, 2, 3, 4, 5. Then 3
> goes down.
> > With the "firstn" mode, CRUSH simply adjusts its calculation in a way
> that it selects 1 and 2, then selects 3 but discovers it's down, so it
> retries and selects 4 and 5, and then goes on to select a new OSD 6. So the
> final CRUSH mapping change is
> > 1, 2, 3, 4, 5 -> 1, 2, 4, 5, 6.
> >
> > But if you're storing an EC pool, that means you just changed the data
> mapped to OSDs 4, 5, and 6! That's terrible! So the "indep" mode attempts
> to not do that. (It still *might* conflict, but the odds are much lower).
> You can instead expect it, when it selects the failed 3, to try again and
> pick out 6, for a final transformation of:
> > 1, 2, 3, 4, 5 -> 1, 2, 6, 4, 5
> > -Greg
> >
> >>
> >>
> >>
> >> k
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to