Re: [ceph-users] Problems with statistics after upgrade to luminous

Gregory Farnum Mon, 10 Jul 2017 13:24:26 -0700

On Mon, Jul 10, 2017 at 1:00 PM Sage Weil <sw...@redhat.com> wrote:

> On Mon, 10 Jul 2017, Ruben Kerkhof wrote:
> > On Mon, Jul 10, 2017 at 7:44 PM, Sage Weil <sw...@redhat.com> wrote:
> > > On Mon, 10 Jul 2017, Gregory Farnum wrote:
> > >> On Mon, Jul 10, 2017 at 12:57 AM Marc Roos <m.r...@f1-outsourcing.eu>
> wrote:
> > >>
> > >>       I need a little help with fixing some errors I am having.
> > >>
> > >>       After upgrading from Kraken im getting incorrect values reported
> > >>       on
> > >>       placement groups etc. At first I thought it is because I was
> > >>       changing
> > >>       the public cluster ip address range and modifying the monmap
> > >>       directly.
> > >>       But after deleting and adding a monitor this ceph daemon dump is
> > >>       still
> > >>       incorrect.
> > >>
> > >>
> > >>
> > >>
> > >>       ceph daemon mon.a perf dump cluster
> > >>       {
> > >>           "cluster": {
> > >>               "num_mon": 3,
> > >>               "num_mon_quorum": 3,
> > >>               "num_osd": 6,
> > >>               "num_osd_up": 6,
> > >>               "num_osd_in": 6,
> > >>               "osd_epoch": 3842,
> > >>               "osd_bytes": 0,
> > >>               "osd_bytes_used": 0,
> > >>               "osd_bytes_avail": 0,
> > >>               "num_pool": 0,
> > >>               "num_pg": 0,
> > >>               "num_pg_active_clean": 0,
> > >>               "num_pg_active": 0,
> > >>               "num_pg_peering": 0,
> > >>               "num_object": 0,
> > >>               "num_object_degraded": 0,
> > >>               "num_object_misplaced": 0,
> > >>               "num_object_unfound": 0,
> > >>               "num_bytes": 0,
> > >>               "num_mds_up": 1,
> > >>               "num_mds_in": 1,
> > >>               "num_mds_failed": 0,
> > >>               "mds_epoch": 816
> > >>           }
> > >>
> > >>       }
> > >>
> > >>
> > >> Huh, I didn't know that existed.
> > >>
> > >> So, yep, most of those values aren't updated any more. From a grep,
> you can
> > >> still trust:
> > >> num_mon
> > >> num_mon_quorum
> > >> num_osd
> > >> num_osd_up
> > >> num_osd_in
> > >> osd_epoch
> > >> num_mds_up
> > >> num_mds_in
> > >> num_mds_failed
> > >> mds_epoch
> > >>
> > >> We might be able to keep updating the others when we get reports from
> the
> > >> manager, but it'd be simpler to just rip them out — I don't think the
> admin
> > >> socket is really the right place to get cluster summary data like
> this.
> > >> Sage, any thoughts?
> > >
> > > These were added to fill a gap when operators are collecting everything
> > > via collectd or similar.
> >
> > Indeed, this has been reported as
> > https://github.com/collectd/collectd/issues/2345
> >
> > > Getting the same cluster-level data from
> > > multiple mons is redundant but it avoids having to code up a separate
> > > collector that polls the CLI or something.
> > >
> > > I suspect once we're funneling everything through a mgr module this
> > > problem will go away and we can remove this.
> >
> > That would be great, having collectd running on each monitor always felt
> > a bit weird. If anyone wants to contribute patches to the collectd Ceph
> > plugin to support the mgr, we would really appreciate that.
>
> To be clear, what we're currently working on right here is a *prometheus*
> module/plugin for mgr that will funnel the metrics for *all* ceph daemons
> through a single endpoint to prometheus.  I suspect we can easily
> include the cluster-level stats there.
>
> I'm not sure what the situation looks like with collectd or if there is
> any interest or work with making mgr behavior like a proxy for all
> of the cluster and daemon stats.
>
> > > Until then, these are easy
> > > to fix by populating from PGMapDigest... my vote is we do that!
> >
> > Yes please :)
>
> I've added a ticket for luminous:
>
>         http://tracker.ceph.com/issues/20563
>
> sage



https://github.com/ceph/ceph/pull/16249

Checked with vstart and that appears to resolve it correctly. :)
-Greg

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problems with statistics after upgrade to luminous

Reply via email to