Hi Jeff,

it would probably wise to first check what these slow requests are:
1) ceph health detail -> This will tell you which OSDs are experiencing the 
slow requests
2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the above 
OSDs will tell you what theses ops are waiting for.

My fair guess is that either you have a network problem or some other drives in 
your cluster are about to die or are experiencing write errors causing retries 
and slowing the request processing.

Just to be sure, if your drives are SMART capable, use smartctl to look ate the 
stats for the drives you will have potentially identified in the steps above.

Regards
JC



> On Nov 20, 2014, at 06:00, Jeff <j...@usedmoviefinder.com> wrote:
> 
> Hi,
> 
>       We have a five node cluster that has been running for a long
> time (over a year).  A few weeks ago we upgraded to 0.87 (giant) and 
> things continued to work well.  
> 
>       Last week a drive failed on one of the nodes.  We replaced the
> drive and things were working well again.
> 
>       After about six days we started getting lots of "slow
> requests...blocked for..." messages (100's/hour) and performance has been
> terrible.  Since then we've made sure to have all of the latest OS patches
> and rebooted all five nodes.  We are still seeing a lot of slow
> requests/blocked messages.  Any idea(s) on what's wrong/where to look?
> 
> Thanks!
>       Jeff
> -- 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to