Re: [ceph-users] Random Health_warn

Scottix Thu, 23 Feb 2017 15:19:26 -0800

That sounds about right, I do see blocked requests sometimes when it is
under really heavy load.


Looking at some examples I think summary should list the issues.
"summary": [],
"overall_status": "HEALTH_OK",

I'll try logging that too.

Scott

On Thu, Feb 23, 2017 at 3:00 PM David Turner <david.tur...@storagecraft.com>
wrote:

> There are multiple approaches to give you more information about the
> Health state.  CLI has these 2 options:
> ceph health detail
> ceph status
>
> I also like using ceph-dash.  ( https://github.com/Crapworks/ceph-dash )
>  It has an associated nagios check to scrape the ceph-dash page.
>
> I personally do `watch ceph status` when I'm monitoring the cluster
> closely.  It will show you things like blocked requests, osds flapping, mon
> clock skew, or whatever your problem is causing the health_warn state.  The
> most likely cause for health_warn off and on is blocked requests.  Those
> are caused by any number of things that you would need to diagnose further
> if that is what is causing the health_warn state.
>
> ------------------------------
>
> <https://storagecraft.com> David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation <https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> ------------------------------
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
> ------------------------------
>
> ________________________________________
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of John
> Spray [jsp...@redhat.com]
> Sent: Thursday, February 23, 2017 3:47 PM
> To: Scottix
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Random Health_warn
>
>
> On Thu, Feb 23, 2017 at 9:49 PM, Scottix <scot...@gmail.com> wrote:
> > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> >
> > We are seeing a weird behavior or not sure how to diagnose what could be
> > going on. We started monitoring the overall_status from the json query
> and
> > every once in a while we would get a HEALTH_WARN for a minute or two.
> >
> > Monitoring logs.
> > 02/23/2017 07:25:54 AM HEALTH_OK
> > 02/23/2017 07:24:54 AM HEALTH_WARN
> > 02/23/2017 07:23:55 AM HEALTH_OK
> > 02/23/2017 07:22:54 AM HEALTH_OK
> > ...
> > 02/23/2017 05:13:55 AM HEALTH_OK
> > 02/23/2017 05:12:54 AM HEALTH_WARN
> > 02/23/2017 05:11:54 AM HEALTH_WARN
> > 02/23/2017 05:10:54 AM HEALTH_OK
> > 02/23/2017 05:09:54 AM HEALTH_OK
> >
> > When I check the mon leader logs there is no indication of an error or
> > issues that could be occuring. Is there a way to find what is causing the
> > HEALTH_WARN?
>
> Possibly not without grabbing more than just the overall status at the
> same time as you're grabbing the OK/WARN status.
>
> Internally, the OK/WARN/ERROR health state is generated on-demand by
> applying a bunch of checks to the state of the system when the user
> runs the health command -- the system doesn't know it's in a warning
> state until it's asked.  Often you will see a corresponding log
> message, but not necessarily.
>
> John
>
> > Best,
> > Scott
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Random Health_warn

Reply via email to