On Mon, May 21, 2018 at 11:19 AM Andras Pataki <
apat...@flatironinstitute.org> wrote:

> Hi Greg,
>
> Thanks for the detailed explanation - the examples make a lot of sense.
>
> One followup question regarding a two level crush rule like:
>
>
> step take default
> step choose 3 type=rack
> step chooseleaf 3 type=host
> step emit
>
> If the erasure code has 9 chunks, this lines up exactly without any
> problems.  What if the erasure code isn't a nice product of the racks and
> hosts/rack, for example 6+2 with the above example?  Will it just take 3
> chunks in the first two racks and 2 from the last without any issues?
>

Yes, assuming your ceph install is new enough. (At one point it crashed if
you did that :o)



The other direction I presume can't work, i.e. on the above example I can't
> put any erasure code with more than 9 chunks.
>

Right


>
> Andras
>
>
>
> On 05/18/2018 06:30 PM, Gregory Farnum wrote:
>
> On Thu, May 17, 2018 at 9:05 AM Andras Pataki <
> apat...@flatironinstitute.org> wrote:
>
>> I've been trying to wrap my head around crush rules, and I need some
>> help/advice.  I'm thinking of using erasure coding instead of
>> replication, and trying to understand the possibilities for planning for
>> failure cases.
>>
>> For a simplified example, consider a 2 level topology, OSDs live on
>> hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3
>> erasure code that would put at most 1 of the 9 chunks on a host, and no
>> more than 3 chunks in a rack (so in case the rack is lost, we still have
>> a way to recover).  Some racks may not have 3 hosts in them, so they
>> could potentially accept only 1 or 2 chunks then.  How can something
>> like this be implemented as a crush rule?  Or, if not exactly this,
>> something in this spirit?  I don't want to say that all chunks need to
>> live in a separate rack because that is too restrictive (some racks may
>> be much bigger than others, or there might not even be 9 racks).
>>
>
> Unfortunately what you describe here is a little too detailed in ways
> CRUSH can't easily specify. You should think of a CRUSH rule as a sequence
> of steps that start out at a root (the "take" step), and incrementally
> specify more detail about which piece of the CRUSH hierarchy they run on,
> but run the *same* rule on every piece they select.
>
> So the simplest thing that comes close to what you suggest is:
> (forgive me if my syntax is slightly off, I'm doing this from memory)
> step take default
> step chooseleaf n type=rack
> step emit
>
> That would start at the default root, select "n" racks (9, in your case)
> and then for each rack find an OSD within it. (chooseleaf is special and
> more flexibly than most of the CRUSH language; it's nice because if it
> can't find an OSD in one of the selected racks, it will pick another rack).
> But a rule that's more illustrative of how things work is:
> step take default
> step choose 3 type=rack
> step chooseleaf 3 type=host
> step emit
>
> That one selects three racks, then selects three OSDs within different
> hosts *in each rack*. (You'll note that it doesn't necessarily work out so
> well if you don't want 9 OSDs!) If one of the racks it selected doesn't
> have 3 separate hosts...well, tough, it tried to do what you told it. :/
>
> If you were dedicated, you could split up your racks into
> equivalently-sized units — let's say rows. Then you could do
> step take default
> step choose 3 type=row
> step chooseleaf 3 type=host
> step emit
>
> Assuming you have 3+ rows of good size, that'll get you 9 OSDs which are
> all on different hosts.
> -Greg
>
>
>>
>> Thanks,
>>
>> Andras
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to