>
>
> Hi David,
>
> Yes heath detail outputs all the errors etc and recovery / backfill is
> going on, just taking time 25% misplaced and 1.5 degraded.
>
> I can list out the pools and see sizes etc..
>
> My main problem is I have no client IO from a read perspective, I cannot
> start vms I'm openstack and ceph -w states that rd is zero most of the time.
>
> I can restart the nodes and the osd reconnect and come in correctly..
>
> I was chasing a missing object but was able to correct that by setting the
> crush weight to 0.0 on the secondary.
>
> Kind regards
>
> Lee
>
> On Sun, 2 Sep 2018, 13:43 David C, <dcsysengin...@gmail.com> wrote:
>
>> Does "ceph health detail" work?
>> Have you manually confirmed the OSDs on the nodes are working?
>> What was the replica size of the pools?
>> Are you seeing any progress with the recovery?
>>
>>
>>
>> On Sun, Sep 2, 2018 at 9:42 AM Lee <lqui...@gmail.com> wrote:
>>
>>> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x
>>> OSD Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
>>> enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
>>> with 2 nodes running correctly, 2 hours later a second OSD node failed
>>> complaining of readwrite errors to the physical drives, i assume this was a
>>> heat issue as when rebooted this came back online ok and ceph started to
>>> repair itself. We have since brought the first failed node back on by
>>> replacing the ssd and recreating the journals hoping it would all repair..
>>> Our pools are min 2 repl.
>>>
>>> The problem we have is client IO (read) is totally blocked, and when I
>>> query the stuck PG's it just hangs..
>>>
>>> For example the check version command just errors with:
>>>
>>> Error EINTR: problem getting command descriptions from on various OSD's
>>> so I cannot even query the inactive PG's
>>>
>>> root@node31-a4:~# ceph -s
>>>     cluster 7c24e1b9-24b3-4a1b-8889-9b2d7fd88cd2
>>>      health HEALTH_WARN
>>>             83 pgs backfill
>>>             2 pgs backfill_toofull
>>>             3 pgs backfilling
>>>             48 pgs degraded
>>>             1 pgs down
>>>             31 pgs incomplete
>>>             1 pgs recovering
>>>             29 pgs recovery_wait
>>>             1 pgs stale
>>>             48 pgs stuck degraded
>>>             31 pgs stuck inactive
>>>             1 pgs stuck stale
>>>             148 pgs stuck unclean
>>>             17 pgs stuck undersized
>>>             17 pgs undersized
>>>             599 requests are blocked > 32 sec
>>>             recovery 111489/4697618 objects degraded (2.373%)
>>>             recovery 772268/4697618 objects misplaced (16.440%)
>>>             recovery 1/2171314 unfound (0.000%)
>>>      monmap e5: 3 mons at {bc07s12-a7=
>>> 172.27.16.11:6789/0,bc07s13-a7=172.27.16.21:6789/0,bc07s14-a7=172.27.16.15:6789/0
>>> }
>>>             election epoch 198, quorum 0,1,2
>>> bc07s12-a7,bc07s14-a7,bc07s13-a7
>>>      osdmap e18727: 25 osds: 25 up, 25 in; 90 remapped pgs
>>>       pgmap v70996322: 1792 pgs, 13 pools, 8210 GB data, 2120 kobjects
>>>             16783 GB used, 6487 GB / 23270 GB avail
>>>             111489/4697618 objects degraded (2.373%)
>>>             772268/4697618 objects misplaced (16.440%)
>>>             1/2171314 unfound (0.000%)
>>>                 1639 active+clean
>>>                   66 active+remapped+wait_backfill
>>>                   30 incomplete
>>>                   25 active+recovery_wait+degraded
>>>                   15 active+undersized+degraded+remapped+wait_backfill
>>>                    4 active+recovery_wait+degraded+remapped
>>>                    4 active+clean+scrubbing
>>>                    2 active+remapped+wait_backfill+backfill_toofull
>>>                    1 down+incomplete
>>>                    1 active+remapped+backfilling
>>>                    1 active+clean+scrubbing+deep
>>>                    1 stale+active+undersized+degraded
>>>                    1 active+undersized+degraded+remapped+backfilling
>>>                    1 active+degraded+remapped+backfilling
>>>                    1 active+recovering+degraded
>>> recovery io 29385 kB/s, 7 objects/s
>>>   client io 5877 B/s wr, 1 op/s
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to