Hello,

since we upgraded to Luminous (12.2.2), we use the internal Ceph
exporter for getting the Ceph metrics to Prometheus. At random times we
get a Internal Server Error from the Ceph exporter, with python having a
key error with some random metric. Often it is "pg_*".

Here is an example of the python exception:

    Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, 
in respond
        response.body = self.handler()
      File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 
217, in __call__
        self.body = self.oldhandler(*args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, 
in __call__
        return self.callable(*self.args, **self.kwargs)
      File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics
        metrics = global_instance().collect()
      File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect
        self.get_pg_status()
      File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status
        self.metrics[path].set(value)
    KeyError: 'pg_deep'

After a certain time (could be 3-5 minutes oder sometimes even 40
minutes), the metric sending starts working again without any help.


Has anyone got an idea what could be done about that or does experience
similar problems?

Thanks,
Falk

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to