Hi, If you add an OSD to an existing cluster, ceph will move some existing data around so the new OSD gets its respective share of usage right away.
Now I noticed that during this moving around, ceph reports the relevant PG's as degraded. I can more or less understand the logic here: if a piece of data is supposed to be in a certain place (the new OSD), but it is not yet there, it's degraded. However I would hope that the movement of data is executed in such a way that first a new copy is made on the new OSD and only after successfully doing that, one of the existing copies is removed. If so, there is never actually any "degradation" of that PG. More to the point, if I have a PG replicated over three OSD's: 1, 2 and 3; now I add an OSD 4, and ceph decides to move the copy of OSD 3 to the new OSD 4; if it turns out that ceph can't read the copies on OSD 1 and 2 due to some disk error, I would assume that ceph would still use the copy that exists on OSD 3 to populate the copy on OSD 4. Is that indeed the case? I have a very similar question about removing an OSD. You can tell ceph to mark an OSD as "out" before physically removing it. The OSD is still "up" but ceph will no longer assign PG's to it, and will make new copies of the PG's that are on this OSD to other OSD's. Now again ceph will report degradation, even though the "out" OSD is still "up", so the existing copies are not actually lost. Does ceph use the OSD that is marked "out" as a source for making the new copies on other OSD's? Thanks, Erik. _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com