Hi Patrick, We actually already tried that before (get three from racks, then the 4th from hosts). What we ended up with was a pg that was supposed to go on the same osd twice … so we rolled back in the end :)
Then we thought it might be better to ask how to do this as the scenario described shouldn't be an exotic use case. Cheers, Arne -- Arne Wiebalck CERN IT On Mar 19, 2013, at 3:09 PM, Patrick McGarry <patr...@inktank.com<mailto:patr...@inktank.com>> wrote: Hey Arne, So I am not one of the CRUSH-wizards by any means, but while we are waiting for them I wanted to take a crack at it so you weren't left hanging. You are able to make more complex choices than just a single chooseleaf statment in your rules. Take the example from the doc where you want one copy on an SSD and one on platter: http://ceph.com/docs/master/rados/operations/crush-map/ So you can either try to build a "do this N times and put the N-x into this other place (even if it's just the same hosts), or you could just have it iterate at the host level instead of the rack level. Perhaps the CRUSH wizards can give you a more elegant solution when they wake up, but I figured this might get you started down a road to play with. Shout if you have questions, and good luck! Best Regards, Patrick McGarry Director, Community || Inktank http://ceph.com || http://inktank.com @scuttlemonkey || @ceph || @inktank On Tue, Mar 19, 2013 at 5:55 AM, Arne Wiebalck <arne.wieba...@cern.ch> wrote: Hi all, We're trying to spread data in our ceph cluster as much as possible, that is pick different racks, then different hosts, then different OSDs. It seems to work fine as long as there are enough buckets available, but if we ask for more replicas than we have racks, for instance, the requested number of replicas is not achieved: For example, what we've seen is that with a replication size of 4, a rule like rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type rack step emit } and only 3 (!) racks we get only 3 replicas, like osdmap e3078 pg 1.ba (1.ba) -> up [116,37,161] acting [116,37,161] What we'd like is basically that crush tries to find 4 different racks (for the 4 replicas), and if it finds only 3, pick 4 different hosts across the 3 racks. Is there an easy way to do this? BTW, health is OK despite having not enough replicas for the pool. What's the best way to detect such situations where the actual state deviates from the desired state? TIA, Arne -- Arne Wiebalck CERN IT _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com