Thank you Loic & Greg. We followed the troubleshooting directions and ran the
crushtool in test mode to verify that CRUSH was giving up too soon, and then
confirmed that changing the set_choose_tries value to 100 would resolve the
issue (it did).
We then implemented the change in the cluster, wh
Hi Paul,
Contrary to what the documentation states at
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon
the crush ruleset can be modified (an update at
https://github.com/ceph/ceph/pull/4306 will fix that). Placement groups will
move around, but that
Thanks for the insights, Greg. It would be great if the CRUSH rule for an EC
pool can be dynamically changed…but if that’s not the case, the troubleshooting
doc also offers up the idea of adding more OSDs, and we have another 8 OSDs
(one from each node) we can move into the default root.
Howeve