I've had similar issues, but I think your problem might be something
else. Could you send the output of "ceph osd df"?
Other people will probably be interested in what version you are using
as well.
Den 2018-03-27 kl. 20:07, skrev Jon Light:
Hi all,
I'm adding a new OSD node with 36 OSDs to my cluster and have run into
some problems. Here are some of the details of the cluster:
1 OSD node with 80 OSDs
1 EC pool with k=10, m=3
pg_num 1024
osd failure domain
I added a second OSD node and started creating OSDs with ceph-deploy,
one by one. The first 2 added fine, but each subsequent new OSD
resulted in more and more PGs stuck activating. I've added a total of
14 new OSDs, but had to set 12 of those with a weight of 0 to get the
cluster healthy and usable until I get it fixed.
I have read some things about similar behavior due to PG overdose
protection, but I don't think that's the case here because the failure
domain is set to osd. Instead, I think my CRUSH rule need some attention:
rule main-storage {
id 1
type erasure
min_size 3
max_size 13
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type osd
step emit
}
I don't believe I have modified anything from the automatically
generated rule except for the addition of the hdd class.
I have been reading the documentation on CRUSH rules, but am having
trouble figuring out if the rule is setup properly. After a few more
nodes are added I do want to change the failure domain to host, but
osd is sufficient for now.
Can anyone help out to see if the rule is causing the problems or if I
should be looking at something else?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com