I had this same sort of thing with Hammer. Looking forward to your results. Please post your configuration when done. I am contemplating doing a similar action to resolve my issues and it would be interesting in knowing your outcome first.
//Tu On Thu, Apr 28, 2016 at 1:18 PM -0700, "Andrus, Brian Contractor" <bdand...@nps.edu> wrote: Load on all nodes is 1.04 to 1.07 I am updating now to Jewel 10.2 (from 9.2) This is CephFS with SSD journals. Hopefully the update to jewel fixes lots. Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 From: Lincoln Bryant [mailto:linco...@uchicago.edu] Sent: Thursday, April 28, 2016 12:56 PM To: Andrus, Brian Contractor Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Troubleshoot blocked OSDs OK, a few more questions. What does the load look like on the OSDs with ‘iostat’ during the rsync? What version of Ceph? Are you using RBD, CephFS, something else? SSD journals or no? —Lincoln On Apr 28, 2016, at 2:53 PM, Andrus, Brian Contractor <bdand...@nps.edu> wrote: Lincoln, That was the odd thing to me. Ceph health detail listed all 4 OSDs, so I checked all the systems. I have since let it settle until it is OK again and started. Within a couple minutes, it started showing blocked requests and they are indeed on all 4 OSDs. Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 From: Lincoln Bryant [mailto:linco...@uchicago.edu] Sent: Thursday, April 28, 2016 12:31 PM To: Andrus, Brian Contractor Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Troubleshoot blocked OSDs Hi Brian, The first thing you can do is “ceph health detail”, which should give you some more information about which OSD(s) have blocked requests. If it’s isolated to one OSD in particular, perhaps use iostat to check utilization and/or smartctl to check health. —Lincoln On Apr 28, 2016, at 2:26 PM, Andrus, Brian Contractor <bdand...@nps.edu> wrote: All, I have a small ceph cluster with 4 OSDs and 3 MONs on 4 systems. I was rsyncing about 50TB of files and things get very slow. To the point I stopped the rsync, but even with everything stopped, I see: health HEALTH_WARN 80 requests are blocked > 32 sec The number was as high as 218, but they seem to be draining down. I see no issues on any of the systems, CPU load is low, memory usage is low. How do I go about finding why a request is blocked for so long? These have been hitting >500 seconds for block time. Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com