Hi David,

Yes heath detail outputs all the errors etc and recovery / backfill is
going on, just taking time 25% misplaced and 1.5 degraded.

I can list out the pools and see sizes etc..

My main problem is I have no client IO from a read perspective, I cannot
start vms I'm openstack and ceph -w states that rd is zero most of the time.

I can restart the nodes and the osd reconnect and come in correctly..

I was chasing a missing object but was able to correct that by setting the
crush weight to 0.0 on the secondary.

Kind regards

Lee

On Sun, 2 Sep 2018, 13:43 David C, <dcsysengin...@gmail.com> wrote:

> Does "ceph health detail" work?
> Have you manually confirmed the OSDs on the nodes are working?
> What was the replica size of the pools?
> Are you seeing any progress with the recovery?
>
>
>
> On Sun, Sep 2, 2018 at 9:42 AM Lee <lqui...@gmail.com> wrote:
>
>> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x
>> OSD Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
>> enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
>> with 2 nodes running correctly, 2 hours later a second OSD node failed
>> complaining of readwrite errors to the physical drives, i assume this was a
>> heat issue as when rebooted this came back online ok and ceph started to
>> repair itself. We have since brought the first failed node back on by
>> replacing the ssd and recreating the journals hoping it would all repair..
>> Our pools are min 2 repl.
>>
>> The problem we have is client IO (read) is totally blocked, and when I
>> query the stuck PG's it just hangs..
>>
>> For example the check version command just errors with:
>>
>> Error EINTR: problem getting command descriptions from on various OSD's
>> so I cannot even query the inactive PG's
>>
>> root@node31-a4:~# ceph -s
>>     cluster 7c24e1b9-24b3-4a1b-8889-9b2d7fd88cd2
>>      health HEALTH_WARN
>>             83 pgs backfill
>>             2 pgs backfill_toofull
>>             3 pgs backfilling
>>             48 pgs degraded
>>             1 pgs down
>>             31 pgs incomplete
>>             1 pgs recovering
>>             29 pgs recovery_wait
>>             1 pgs stale
>>             48 pgs stuck degraded
>>             31 pgs stuck inactive
>>             1 pgs stuck stale
>>             148 pgs stuck unclean
>>             17 pgs stuck undersized
>>             17 pgs undersized
>>             599 requests are blocked > 32 sec
>>             recovery 111489/4697618 objects degraded (2.373%)
>>             recovery 772268/4697618 objects misplaced (16.440%)
>>             recovery 1/2171314 unfound (0.000%)
>>      monmap e5: 3 mons at {bc07s12-a7=
>> 172.27.16.11:6789/0,bc07s13-a7=172.27.16.21:6789/0,bc07s14-a7=172.27.16.15:6789/0
>> }
>>             election epoch 198, quorum 0,1,2
>> bc07s12-a7,bc07s14-a7,bc07s13-a7
>>      osdmap e18727: 25 osds: 25 up, 25 in; 90 remapped pgs
>>       pgmap v70996322: 1792 pgs, 13 pools, 8210 GB data, 2120 kobjects
>>             16783 GB used, 6487 GB / 23270 GB avail
>>             111489/4697618 objects degraded (2.373%)
>>             772268/4697618 objects misplaced (16.440%)
>>             1/2171314 unfound (0.000%)
>>                 1639 active+clean
>>                   66 active+remapped+wait_backfill
>>                   30 incomplete
>>                   25 active+recovery_wait+degraded
>>                   15 active+undersized+degraded+remapped+wait_backfill
>>                    4 active+recovery_wait+degraded+remapped
>>                    4 active+clean+scrubbing
>>                    2 active+remapped+wait_backfill+backfill_toofull
>>                    1 down+incomplete
>>                    1 active+remapped+backfilling
>>                    1 active+clean+scrubbing+deep
>>                    1 stale+active+undersized+degraded
>>                    1 active+undersized+degraded+remapped+backfilling
>>                    1 active+degraded+remapped+backfilling
>>                    1 active+recovering+degraded
>> recovery io 29385 kB/s, 7 objects/s
>>   client io 5877 B/s wr, 1 op/s
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to