janhoy opened a new issue, #690: URL: https://github.com/apache/solr-operator/issues/690
Since #307 we now have generic `go` metrics, like mem, gc, threads etc. Let's add application level metrics for the operator iself, that could be useful for Grafana Board and alerts. Suggestions: * Gauge of nuber of currently managed CRD instances for SolrClouds, SolrBackups, SolrPrometheusExporter * Gauge for CRDs currently in a failure state * Reconcile stats * Successful vs failed reconcile events, broken down to what kind of event * Size of pending operations in reconcile queue (if such a thing) * Operation stats * For each operation type (install, upgrade, delete etc) counts and status * Stats on scheduled backup requests: Number success, fail per time unit Goal would be to make a simple Grafana board where you can filter on namespace etc to see raw operator health, and at a glance whether some operations are in failure state etc. Futher filter by labels like SolrCloud name, so you can see number of failed operations towards each cluster, and when they happened. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org