I'm still pretty new at troubleshooting Ceph and since no one has responded yet I'll give a stab.
What is the size of your pool? 'ceph osd pool get <pool name> size' It seems like based on the number of incomplete PGs that it was '1'. I understand that if you are able to bring osd 7 back in, it would clear up. I'm just not seeing a secondary osd for that PG. Disclaimer: I could be totally wrong. Robert LeBlanc On Thu, Dec 18, 2014 at 11:41 PM, Mallikarjun Biradar < mallikarjuna.bira...@gmail.com> wrote: > > Hi all, > > I had 12 OSD's in my cluster with 2 OSD nodes. One of the OSD was in down > state, I have removed that PG from cluster, by removing crush rule for that > OSD. > > Now cluster with 11 OSD's, started rebalancing. After sometime, cluster > status was > > ems@rack6-client-5:~$ sudo ceph -s > cluster eb5452f4-5ce9-4b97-9bfd-2a34716855f1 > health HEALTH_WARN 1 pgs down; 252 pgs incomplete; 10 pgs peering; 73 > pgs stale; 262 pgs stuck inactive; 73 pgs stuck stale; 262 pgs stuck > unclean; clock skew detected on mon.rack6-client-5, mon.rack6-client-6 > monmap e1: 3 mons at {rack6-client-4= > 10.242.43.105:6789/0,rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0}, > election epoch 12, quorum 0,1,2 rack6-client-4,rack6-client-5,rack6-client-6 > osdmap e2648: 11 osds: 11 up, 11 in > pgmap v554251: 846 pgs, 3 pools, 4383 GB data, 1095 kobjects > 11668 GB used, 26048 GB / 37717 GB avail > 63 stale+active+clean > 1 down+incomplete > 521 active+clean > 251 incomplete > 10 stale+peering > ems@rack6-client-5:~$ > > > To fix this, i cant run "ceph osd lost <osd.id>" to remove the PG which > is in down state. As OSD is already removed from the cluster. > > ems@rack6-client-4:~$ sudo ceph pg dump all | grep down > dumped all in format plain > 1.38 1548 0 0 0 0 6492782592 3001 > 3001 down+incomplete 2014-12-18 15:58:29.681708 1118'508438 > 2648:1073892 [6,3,1] 6 [6,3,1] 6 76'437184 > 2014-12-16 12:38:35.322835 76'437184 2014-12-16 12:38:35.322835 > ems@rack6-client-4:~$ > > ems@rack6-client-4:~$ sudo ceph pg 1.38 query > ............. > "recovery_state": [ > { "name": "Started\/Primary\/Peering", > "enter_time": "2014-12-18 15:58:29.681666", > "past_intervals": [ > { "first": 1109, > "last": 1118, > "maybe_went_rw": 1, > ................... > ................... > "down_osds_we_would_probe": [ > 7], > "peering_blocked_by": []}, > ................... > ................... > > ems@rack6-client-4:~$ sudo ceph osd tree > # id weight type name up/down reweight > -1 36.85 root default > -2 20.1 host rack2-storage-1 > 0 3.35 osd.0 up 1 > 1 3.35 osd.1 up 1 > 2 3.35 osd.2 up 1 > 3 3.35 osd.3 up 1 > 4 3.35 osd.4 up 1 > 5 3.35 osd.5 up 1 > -3 16.75 host rack2-storage-5 > 6 3.35 osd.6 up 1 > 8 3.35 osd.8 up 1 > 9 3.35 osd.9 up 1 > 10 3.35 osd.10 up 1 > 11 3.35 osd.11 up 1 > ems@rack6-client-4:~$ sudo ceph osd lost 7 --yes-i-really-mean-it > osd.7 is not down or doesn't exist > ems@rack6-client-4:~$ > > > Can somebody suggest any other recovery step to come out of this? > > -Thanks & Regards, > Mallikarjun Biradar > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com