Hi all, I had 12 OSD's in my cluster with 2 OSD nodes. One of the OSD was in down state, I have removed that PG from cluster, by removing crush rule for that OSD.
Now cluster with 11 OSD's, started rebalancing. After sometime, cluster status was ems@rack6-client-5:~$ sudo ceph -s cluster eb5452f4-5ce9-4b97-9bfd-2a34716855f1 health HEALTH_WARN 1 pgs down; 252 pgs incomplete; 10 pgs peering; 73 pgs stale; 262 pgs stuck inactive; 73 pgs stuck stale; 262 pgs stuck unclean; clock skew detected on mon.rack6-client-5, mon.rack6-client-6 monmap e1: 3 mons at {rack6-client-4= 10.242.43.105:6789/0,rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0}, election epoch 12, quorum 0,1,2 rack6-client-4,rack6-client-5,rack6-client-6 osdmap e2648: 11 osds: 11 up, 11 in pgmap v554251: 846 pgs, 3 pools, 4383 GB data, 1095 kobjects 11668 GB used, 26048 GB / 37717 GB avail 63 stale+active+clean 1 down+incomplete 521 active+clean 251 incomplete 10 stale+peering ems@rack6-client-5:~$ To fix this, i cant run "ceph osd lost <osd.id>" to remove the PG which is in down state. As OSD is already removed from the cluster. ems@rack6-client-4:~$ sudo ceph pg dump all | grep down dumped all in format plain 1.38 1548 0 0 0 0 6492782592 3001 3001 down+incomplete 2014-12-18 15:58:29.681708 1118'508438 2648:1073892 [6,3,1] 6 [6,3,1] 6 76'437184 2014-12-16 12:38:35.322835 76'437184 2014-12-16 12:38:35.322835 ems@rack6-client-4:~$ ems@rack6-client-4:~$ sudo ceph pg 1.38 query ............. "recovery_state": [ { "name": "Started\/Primary\/Peering", "enter_time": "2014-12-18 15:58:29.681666", "past_intervals": [ { "first": 1109, "last": 1118, "maybe_went_rw": 1, ................... ................... "down_osds_we_would_probe": [ 7], "peering_blocked_by": []}, ................... ................... ems@rack6-client-4:~$ sudo ceph osd tree # id weight type name up/down reweight -1 36.85 root default -2 20.1 host rack2-storage-1 0 3.35 osd.0 up 1 1 3.35 osd.1 up 1 2 3.35 osd.2 up 1 3 3.35 osd.3 up 1 4 3.35 osd.4 up 1 5 3.35 osd.5 up 1 -3 16.75 host rack2-storage-5 6 3.35 osd.6 up 1 8 3.35 osd.8 up 1 9 3.35 osd.9 up 1 10 3.35 osd.10 up 1 11 3.35 osd.11 up 1 ems@rack6-client-4:~$ sudo ceph osd lost 7 --yes-i-really-mean-it osd.7 is not down or doesn't exist ems@rack6-client-4:~$ Can somebody suggest any other recovery step to come out of this? -Thanks & Regards, Mallikarjun Biradar
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com