Re: [ceph-users] Recovering from PG in down+incomplete state

Robert LeBlanc Fri, 19 Dec 2014 10:29:07 -0800

I'm still pretty new at troubleshooting Ceph and since no one has responded
yet I'll give a stab.


What is the size of your pool?
'ceph osd pool get <pool name> size'

It seems like based on the number of incomplete PGs that it was '1'. I
understand that if you are able to bring osd 7 back in, it would clear up.
I'm just not seeing a secondary osd for that PG.

Disclaimer: I could be totally wrong.

Robert LeBlanc

On Thu, Dec 18, 2014 at 11:41 PM, Mallikarjun Biradar <
mallikarjuna.bira...@gmail.com> wrote:
>
> Hi all,
>
> I had 12 OSD's in my cluster with 2 OSD nodes. One of the OSD was in down
> state, I have removed that PG from cluster, by removing crush rule for that
> OSD.
>
> Now cluster with 11 OSD's, started rebalancing. After sometime, cluster
> status was
>
> ems@rack6-client-5:~$ sudo ceph -s
>     cluster eb5452f4-5ce9-4b97-9bfd-2a34716855f1
>      health HEALTH_WARN 1 pgs down; 252 pgs incomplete; 10 pgs peering; 73
> pgs stale; 262 pgs stuck inactive; 73 pgs stuck stale; 262 pgs stuck
> unclean; clock skew detected on mon.rack6-client-5, mon.rack6-client-6
>      monmap e1: 3 mons at {rack6-client-4=
> 10.242.43.105:6789/0,rack6-client-5=10.242.43.106:6789/0,rack6-client-6=10.242.43.107:6789/0},
> election epoch 12, quorum 0,1,2 rack6-client-4,rack6-client-5,rack6-client-6
>      osdmap e2648: 11 osds: 11 up, 11 in
>       pgmap v554251: 846 pgs, 3 pools, 4383 GB data, 1095 kobjects
>             11668 GB used, 26048 GB / 37717 GB avail
>                   63 stale+active+clean
>                    1 down+incomplete
>                  521 active+clean
>                  251 incomplete
>                   10 stale+peering
> ems@rack6-client-5:~$
>
>
> To fix this, i cant run "ceph osd lost <osd.id>" to remove the PG which
> is in down state. As OSD is already removed from the cluster.
>
> ems@rack6-client-4:~$ sudo ceph pg dump all | grep down
> dumped all in format plain
> 1.38    1548    0       0       0       0       6492782592      3001
>  3001    down+incomplete 2014-12-18 15:58:29.681708      1118'508438
> 2648:1073892    [6,3,1]     6       [6,3,1] 6       76'437184
> 2014-12-16 12:38:35.322835      76'437184       2014-12-16 12:38:35.322835
> ems@rack6-client-4:~$
>
> ems@rack6-client-4:~$ sudo ceph pg 1.38 query
> .............
> "recovery_state": [
>         { "name": "Started\/Primary\/Peering",
>           "enter_time": "2014-12-18 15:58:29.681666",
>           "past_intervals": [
>                 { "first": 1109,
>                   "last": 1118,
>                   "maybe_went_rw": 1,
> ...................
> ...................
> "down_osds_we_would_probe": [
>                 7],
>           "peering_blocked_by": []},
> ...................
> ...................
>
> ems@rack6-client-4:~$ sudo ceph osd tree
> # id    weight  type name       up/down reweight
> -1      36.85   root default
> -2      20.1            host rack2-storage-1
> 0       3.35                    osd.0   up      1
> 1       3.35                    osd.1   up      1
> 2       3.35                    osd.2   up      1
> 3       3.35                    osd.3   up      1
> 4       3.35                    osd.4   up      1
> 5       3.35                    osd.5   up      1
> -3      16.75           host rack2-storage-5
> 6       3.35                    osd.6   up      1
> 8       3.35                    osd.8   up      1
> 9       3.35                    osd.9   up      1
> 10      3.35                    osd.10  up      1
> 11      3.35                    osd.11  up      1
> ems@rack6-client-4:~$ sudo ceph osd lost 7 --yes-i-really-mean-it
> osd.7 is not down or doesn't exist
> ems@rack6-client-4:~$
>
>
> Can somebody suggest any other recovery step to come out of this?
>
> -Thanks & Regards,
> Mallikarjun Biradar
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Recovering from PG in down+incomplete state

Reply via email to