Hello, since we upgraded to Luminous (12.2.2), we use the internal Ceph exporter for getting the Ceph metrics to Prometheus. At random times we get a Internal Server Error from the Ceph exporter, with python having a key error with some random metric. Often it is "pg_*".
Here is an example of the python exception: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond response.body = self.handler() File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__ self.body = self.oldhandler(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics metrics = global_instance().collect() File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect self.get_pg_status() File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status self.metrics[path].set(value) KeyError: 'pg_deep' After a certain time (could be 3-5 minutes oder sometimes even 40 minutes), the metric sending starts working again without any help. Has anyone got an idea what could be done about that or does experience similar problems? Thanks, Falk
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com