On Mon, Jul 10, 2017 at 1:00 PM Sage Weil <sw...@redhat.com> wrote: > On Mon, 10 Jul 2017, Ruben Kerkhof wrote: > > On Mon, Jul 10, 2017 at 7:44 PM, Sage Weil <sw...@redhat.com> wrote: > > > On Mon, 10 Jul 2017, Gregory Farnum wrote: > > >> On Mon, Jul 10, 2017 at 12:57 AM Marc Roos <m.r...@f1-outsourcing.eu> > wrote: > > >> > > >> I need a little help with fixing some errors I am having. > > >> > > >> After upgrading from Kraken im getting incorrect values reported > > >> on > > >> placement groups etc. At first I thought it is because I was > > >> changing > > >> the public cluster ip address range and modifying the monmap > > >> directly. > > >> But after deleting and adding a monitor this ceph daemon dump is > > >> still > > >> incorrect. > > >> > > >> > > >> > > >> > > >> ceph daemon mon.a perf dump cluster > > >> { > > >> "cluster": { > > >> "num_mon": 3, > > >> "num_mon_quorum": 3, > > >> "num_osd": 6, > > >> "num_osd_up": 6, > > >> "num_osd_in": 6, > > >> "osd_epoch": 3842, > > >> "osd_bytes": 0, > > >> "osd_bytes_used": 0, > > >> "osd_bytes_avail": 0, > > >> "num_pool": 0, > > >> "num_pg": 0, > > >> "num_pg_active_clean": 0, > > >> "num_pg_active": 0, > > >> "num_pg_peering": 0, > > >> "num_object": 0, > > >> "num_object_degraded": 0, > > >> "num_object_misplaced": 0, > > >> "num_object_unfound": 0, > > >> "num_bytes": 0, > > >> "num_mds_up": 1, > > >> "num_mds_in": 1, > > >> "num_mds_failed": 0, > > >> "mds_epoch": 816 > > >> } > > >> > > >> } > > >> > > >> > > >> Huh, I didn't know that existed. > > >> > > >> So, yep, most of those values aren't updated any more. From a grep, > you can > > >> still trust: > > >> num_mon > > >> num_mon_quorum > > >> num_osd > > >> num_osd_up > > >> num_osd_in > > >> osd_epoch > > >> num_mds_up > > >> num_mds_in > > >> num_mds_failed > > >> mds_epoch > > >> > > >> We might be able to keep updating the others when we get reports from > the > > >> manager, but it'd be simpler to just rip them out — I don't think the > admin > > >> socket is really the right place to get cluster summary data like > this. > > >> Sage, any thoughts? > > > > > > These were added to fill a gap when operators are collecting everything > > > via collectd or similar. > > > > Indeed, this has been reported as > > https://github.com/collectd/collectd/issues/2345 > > > > > Getting the same cluster-level data from > > > multiple mons is redundant but it avoids having to code up a separate > > > collector that polls the CLI or something. > > > > > > I suspect once we're funneling everything through a mgr module this > > > problem will go away and we can remove this. > > > > That would be great, having collectd running on each monitor always felt > > a bit weird. If anyone wants to contribute patches to the collectd Ceph > > plugin to support the mgr, we would really appreciate that. > > To be clear, what we're currently working on right here is a *prometheus* > module/plugin for mgr that will funnel the metrics for *all* ceph daemons > through a single endpoint to prometheus. I suspect we can easily > include the cluster-level stats there. > > I'm not sure what the situation looks like with collectd or if there is > any interest or work with making mgr behavior like a proxy for all > of the cluster and daemon stats. > > > > Until then, these are easy > > > to fix by populating from PGMapDigest... my vote is we do that! > > > > Yes please :) > > I've added a ticket for luminous: > > http://tracker.ceph.com/issues/20563 > > sage
https://github.com/ceph/ceph/pull/16249 Checked with vstart and that appears to resolve it correctly. :) -Greg
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com