That sounds about right, I do see blocked requests sometimes when it is under really heavy load.
Looking at some examples I think summary should list the issues. "summary": [], "overall_status": "HEALTH_OK", I'll try logging that too. Scott On Thu, Feb 23, 2017 at 3:00 PM David Turner <david.tur...@storagecraft.com> wrote: > There are multiple approaches to give you more information about the > Health state. CLI has these 2 options: > ceph health detail > ceph status > > I also like using ceph-dash. ( https://github.com/Crapworks/ceph-dash ) > It has an associated nagios check to scrape the ceph-dash page. > > I personally do `watch ceph status` when I'm monitoring the cluster > closely. It will show you things like blocked requests, osds flapping, mon > clock skew, or whatever your problem is causing the health_warn state. The > most likely cause for health_warn off and on is blocked requests. Those > are caused by any number of things that you would need to diagnose further > if that is what is causing the health_warn state. > > ------------------------------ > > <https://storagecraft.com> David Turner | Cloud Operations Engineer | > StorageCraft > Technology Corporation <https://storagecraft.com> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943 > <(385)%20224-2943> > > ------------------------------ > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this > message is prohibited. > ------------------------------ > > ________________________________________ > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of John > Spray [jsp...@redhat.com] > Sent: Thursday, February 23, 2017 3:47 PM > To: Scottix > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Random Health_warn > > > On Thu, Feb 23, 2017 at 9:49 PM, Scottix <scot...@gmail.com> wrote: > > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > > > We are seeing a weird behavior or not sure how to diagnose what could be > > going on. We started monitoring the overall_status from the json query > and > > every once in a while we would get a HEALTH_WARN for a minute or two. > > > > Monitoring logs. > > 02/23/2017 07:25:54 AM HEALTH_OK > > 02/23/2017 07:24:54 AM HEALTH_WARN > > 02/23/2017 07:23:55 AM HEALTH_OK > > 02/23/2017 07:22:54 AM HEALTH_OK > > ... > > 02/23/2017 05:13:55 AM HEALTH_OK > > 02/23/2017 05:12:54 AM HEALTH_WARN > > 02/23/2017 05:11:54 AM HEALTH_WARN > > 02/23/2017 05:10:54 AM HEALTH_OK > > 02/23/2017 05:09:54 AM HEALTH_OK > > > > When I check the mon leader logs there is no indication of an error or > > issues that could be occuring. Is there a way to find what is causing the > > HEALTH_WARN? > > Possibly not without grabbing more than just the overall status at the > same time as you're grabbing the OK/WARN status. > > Internally, the OK/WARN/ERROR health state is generated on-demand by > applying a bunch of checks to the state of the system when the user > runs the health command -- the system doesn't know it's in a warning > state until it's asked. Often you will see a corresponding log > message, but not necessarily. > > John > > > Best, > > Scott > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com