[ceph-users] stuck with active+undersized+degraded on Jewel after cluster maintenance

Pawel S Fri, 03 Aug 2018 04:46:47 -0700

hello!

We did maintenance works (cluster shrinking) on one cluster (jewel) and
after shutting one of osds down we found this situation where recover of pg
can't be started because of "querying" one of peers. We restarted this OSD,
tried to out and in. Nothing helped, finally we moved out data (the pg was
still on it) and removed this osd from crush and whole cluster. But recover
can't start on any other osd to create this copy again. We still have valid
active 2 copies, but we would like to have it clean.
How we can push recover to have this third copy somewhere ? Replication
size is 3 on hosts and there are plenty of them.


Status now:
   health HEALTH_WARN
            1 pgs degraded
            1 pgs stuck degraded
            1 pgs stuck unclean
            1 pgs stuck undersized
            1 pgs undersized
            recovery 268/19265130 objects degraded (0.001%)

Link to PG query details, health status and version commit here:
https://gist.github.com/pejotes/aea71ecd2718dbb3ceab0e648924d06b

best regards!
-- 
Pawel

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] stuck with active+undersized+degraded on Jewel after cluster maintenance

Reply via email to