After a power failure left our jewel cluster crippled, I have hit a sticking 
point in attempted recovery.

Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can.

In addition to rados pools, we were also using CephFS, and the cephfs.metadata 
and cephfs.data pools likely lost plenty of PG’s.

The mds has reported this ever since returning from the power loss:
> # ceph mds stat
> e3627: 1/1/1 up {0=core=up:replay}


When looking at the slow request on the osd, it shows this task which I can’t 
quite figure out. Any help appreciated.

> # ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok dump_ops_in_flight
> {
>     "ops": [
>         {
>             "description": "osd_op(mds.0.3625:8 6.c5265ab3 (undecoded) 
> ack+retry+read+known_if_redirected+full_force e3668)",
>             "initiated_at": "2016-08-31 10:37:18.833644",
>             "age": 22212.235361,
>             "duration": 22212.235379,
>             "type_data": [
>                 "no flag points reached",
>                 [
>                     {
>                         "time": "2016-08-31 10:37:18.833644",
>                         "event": "initiated"
>                     }
>                 ]
>             ]
>         }
>     ],
>     "num_ops": 1
> }

Thanks,

Reed
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to