Just going into production now with a large-ish multisite radosgw setup on
10.2.   We are starting off by alerting on anything that isn't HEALTH_OK,
just to see how things go.   If we get HEALTH_WARN but no mons or OSD's are
down then it will be a low-level alert.   We will massage scripts to pick
up on different conditions.

We're using graphite via collectd for visualization.

    -- Trey


On Fri, Jan 13, 2017 at 3:15 PM, Chris Jones <cjo...@cloudm2.com> wrote:

> General question/survey:
>
> Those that have larger clusters, how are you doing alerting/monitoring?
> Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about
> collectd related but more on initial alerts of an issue or potential issue?
> What threshold do you use basically? Just trying to get a pulse of what
> others are doing.
>
> Thanks in advance.
>
> --
> Best Regards,
> Chris Jones
> ​Bloomberg​
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to