Just going into production now with a large-ish multisite radosgw setup on 10.2. We are starting off by alerting on anything that isn't HEALTH_OK, just to see how things go. If we get HEALTH_WARN but no mons or OSD's are down then it will be a low-level alert. We will massage scripts to pick up on different conditions.
We're using graphite via collectd for visualization. -- Trey On Fri, Jan 13, 2017 at 3:15 PM, Chris Jones <cjo...@cloudm2.com> wrote: > General question/survey: > > Those that have larger clusters, how are you doing alerting/monitoring? > Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about > collectd related but more on initial alerts of an issue or potential issue? > What threshold do you use basically? Just trying to get a pulse of what > others are doing. > > Thanks in advance. > > -- > Best Regards, > Chris Jones > Bloomberg > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com