Hi, 

*Short summery:*
Can we have a metric that reports 3 values (0/1/2), to indicate status 
instead of using labels or adding the status to the metric name?

*Full story:*
I'm working on creating a general recommendation for reporting an 
Kubernetes operator health metric.

The full proposal is here , 
https://github.com/operator-framework/operator-sdk/pull/6315/files.

I proposed to recommend operators to add a new health metric that would 
have the following naming:
*<operator-name-prefix>_operator_health_status *[1]

I proposed that the values of this metric would indicate the health status:
  * `0` - Indicates that the operator is healthy and working as expected.
  * `1` - Indicates that the operator has some issues that needs to be 
addressed and can potentially lead to loss of functionality.
  * `2` - Indicates that the operator is unhealthy and there is a loss of 
functionality that should be addressed.

There is a disagreement about this, since there are no examples that I 
could find that has a third optional value.
Usually these metrics are represented as Boolean (Healthy/Unhealthy) or the 
status is stated in the metric name.

The reviewers believe its not recommend to have more than 2 possible 
values(Boolean).
I see few issues with this:
1. The metric is sent from different operators and it would be problematic 
to have a label to indicate the level of health in a consistent way.
2. I don't see an issue with querying Prometheus with more than 2 values. 
It might be more efficient than filtering with labels.

I would appreciate you insights on this, considering that the metric is 
sent from multiple sources that are all developed separately. 

Thank you,
Shirly Radco

[1] I proposed a different prefix and same suffix since I know there is an 
issue with sending the same metric name to Prometheus with a different help 
text.
Since we can't enforce the help text to be exactly the same, the suffix 
should be enough to be able to display all the operators health metrics in 
the same panel.
Also, it would be easier to identify the origin of metrics that have an 
issue.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4f0161f7-0762-4193-8b5b-783cb6b6dc22n%40googlegroups.com.

Reply via email to