This morning I went through and enabled the influx plugin in ceph-mgr on12.2.2, so far so good.
Only non-obvious step was installing the python-influxdb package that it depends on. Probably needs to be baked into the documentation somewhere. Other than that, 90% of the stats I use are in this, and a few breakdowns of my existing statistics are available now. If I had to make a wishlist of stats I wish were part of this: PG state stats - number of PGs active, clean, scrubbing, scrubbing-deep, backfilling, recovering, etc Pool ops - we have pool level rd/wr_bytes, would love to see pool level rd/wr_ops as well. Cluster level object state stats - Total Objects, Degraded, Misplaced, Unfound, etc daemon (osd/mon/mds/mgr) state stats - total, up, in, active, degraded/failed, quorum, etc osd recovery_bytes - recovery bytes to compliment ops (like ceph -s provides) Otherwise, this seems to be a much better approach than CollectD for data collection and shipping as it eliminates the middleman and puts the mgr daemons to work. Love to see the ceph-mgr daemons grow in capability like this, take load off the mons, and provide more useful functionality. Thanks, Reed > On Jan 11, 2018, at 10:02 AM, Benjeman Meekhof <bmeek...@umich.edu> wrote: > > Hi Reed, > > Someone in our group originally wrote the plugin and put in PR. Since > our commit the plugin was 'forward-ported' to master and made > incompatible with Luminous so we've been using our own version of the > plugin while waiting for the necessary pieces to be back-ported to > Luminous to use the modified upstream version. Now we are in the > process of trying out the back-ported version that is in 12.2.2 as > well as adding some additional code from our version that collects pg > summary information (count of active, etc) and supports sending to > multiple influx destinations. We'll attempt to PR any changes we > make. > > So to answer your question: Yes, we use it but not exactly the > version from upstream in production yet. However in our testing the > module included with 12.2.2 appears to work as expected and we're > planning to move over to it and do any future work based from the > version in the upstream Ceph tree. > > There is one issue/bug that may still exist exist: because of how the > data point timestamps are written inside a loop through OSD stats the > spread is sometimes wide enough that Grafana doesn't group properly > and you get the appearance of extreme spikes in derivative calculation > of rates. We ended up modifying our code to calculate timestamps just > outside the loops that create data points and apply it to every point > created in loops through stats. Of course we'll feed that back > upstream when we get to it and assuming it is still an issue in the > current code. > > thanks, > Ben > > On Thu, Jan 11, 2018 at 2:04 AM, Reed Dier <reed.d...@focusvq.com> wrote: >> Hi all, >> >> Does anyone have any idea if the influx plugin for ceph-mgr is stable in >> 12.2.2? >> >> Would love to ditch collectd and report directly from ceph if that is the >> case. >> >> Documentation says that it is added in Mimic/13.x, however it looks like >> from an earlier ML post that it would be coming to Luminous. >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021302.html >> >> I also see it as a disabled module currently: >> >> $ ceph mgr module ls >> { >> "enabled_modules": [ >> "dashboard", >> "restful", >> "status" >> ], >> "disabled_modules": [ >> "balancer", >> "influx", >> "localpool", >> "prometheus", >> "selftest", >> "zabbix" >> ] >> } >> >> >> Curious if anyone has been using it in place of CollectD/Telegraf for >> feeding InfluxDB with statistics. >> >> Thanks, >> >> Reed >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com