> In both cases, you only get 2 replicas on the remaining 2 hosts.

OK, I was able to reproduce this with crushtool.

> The difference is if you have 4 hosts with 2 osds.  In the choose case, you 
> have
> some fraction of the data that chose the down host in the first step (most of 
> the
> attempts, actually!) and then couldn't find a usable osd, leaving you with 
> only 2

This is also reproducible.

> replicas.  With chooseleaf that doesn't happen.
> 
> The other difference is if you have one of the two OSDs on the host marked 
> out.
> In the choose case, the remaining OSD will get allocated 2x the data; in the
> chooseleaf case, usage will remain proportional with the rest of the cluster 
> and
> the data from the out OSD will be distributed across other OSDs (at least when
> there are > 3 hosts!).

I see, but data distribution seems not optimal in that case.

For example using this crush map:

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host prox-ceph-1 {
        id -2           # do not change unnecessarily
        # weight 7.260
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 3.630
        item osd.1 weight 3.630
}
host prox-ceph-2 {
        id -3           # do not change unnecessarily
        # weight 7.260
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 3.630
        item osd.3 weight 3.630
}
host prox-ceph-3 {
        id -4           # do not change unnecessarily
        # weight 3.630
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 3.630
}

host prox-ceph-4 {
        id -5           # do not change unnecessarily
        # weight 3.630
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 3.630
}

root default {
        id -1           # do not change unnecessarily
        # weight 21.780
        alg straw
        hash 0  # rjenkins1
        item prox-ceph-1 weight 7.260   # 2 OSDs
        item prox-ceph-2 weight 7.260   # 2 OSDs
        item prox-ceph-3 weight 3.630   # 1 OSD
        item prox-ceph-4 weight 3.630   # 1 OSD
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
# end crush map

crushtool shows the following utilization:

# crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization
  device 0:     423
  device 1:     452
  device 2:     429
  device 3:     452
  device 4:     661
  device 5:     655

Any explanation for that?  Maybe related to the small number of devices?

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to