On Thu, May 17, 2018 at 9:05 AM Andras Pataki <apat...@flatironinstitute.org> wrote:
> I've been trying to wrap my head around crush rules, and I need some > help/advice. I'm thinking of using erasure coding instead of > replication, and trying to understand the possibilities for planning for > failure cases. > > For a simplified example, consider a 2 level topology, OSDs live on > hosts, and hosts live in racks. I'd like to set up a rule for a 6+3 > erasure code that would put at most 1 of the 9 chunks on a host, and no > more than 3 chunks in a rack (so in case the rack is lost, we still have > a way to recover). Some racks may not have 3 hosts in them, so they > could potentially accept only 1 or 2 chunks then. How can something > like this be implemented as a crush rule? Or, if not exactly this, > something in this spirit? I don't want to say that all chunks need to > live in a separate rack because that is too restrictive (some racks may > be much bigger than others, or there might not even be 9 racks). > Unfortunately what you describe here is a little too detailed in ways CRUSH can't easily specify. You should think of a CRUSH rule as a sequence of steps that start out at a root (the "take" step), and incrementally specify more detail about which piece of the CRUSH hierarchy they run on, but run the *same* rule on every piece they select. So the simplest thing that comes close to what you suggest is: (forgive me if my syntax is slightly off, I'm doing this from memory) step take default step chooseleaf n type=rack step emit That would start at the default root, select "n" racks (9, in your case) and then for each rack find an OSD within it. (chooseleaf is special and more flexibly than most of the CRUSH language; it's nice because if it can't find an OSD in one of the selected racks, it will pick another rack). But a rule that's more illustrative of how things work is: step take default step choose 3 type=rack step chooseleaf 3 type=host step emit That one selects three racks, then selects three OSDs within different hosts *in each rack*. (You'll note that it doesn't necessarily work out so well if you don't want 9 OSDs!) If one of the racks it selected doesn't have 3 separate hosts...well, tough, it tried to do what you told it. :/ If you were dedicated, you could split up your racks into equivalently-sized units — let's say rows. Then you could do step take default step choose 3 type=row step chooseleaf 3 type=host step emit Assuming you have 3+ rows of good size, that'll get you 9 OSDs which are all on different hosts. -Greg > > Thanks, > > Andras > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com