Thank you Brian for your help with this.
I really appreciate it.

Shirly

On Sunday, March 26, 2023 at 5:44:25 PM UTC+3 Brian Candler wrote:

> > There is a disagreement about this, since there are no examples that I 
> could find that has a third optional value.
>
> The closest example I can think of is Nagios plugins (0=ok, 1=warning, 
> 2=critical, 3=unknown). See nrpe_exporter:
> https://github.com/canonical/nrpe_exporter
> https://www.robustperception.io/nagios-nrpe-prometheus-exporter
>
> And, I guess things like ifOperStatus from snmp_exporter.
>
> I'd say having a 0/1/2 status isn't necessarily "wrong", and in Grafana 
> you can map these numbers to strings and/or colours.
>
> However, you're also right to say this isn't normal recommended practice. 
> Typically you've have a set of timeseries and set one to 1 and the others 
> to 0. Client libraries tend to call this group of metrics an "enum", e.g.
> https://github.com/prometheus/client_python#enum
>
> I wouldn't worry about efficiency. Prometheus timeseries are very cheap, 
> especially when the metric values are mostly constant.
>
> On Sunday, 26 March 2023 at 07:24:53 UTC+1 Shirly Radco wrote:
>
>> Hi, 
>>
>> *Short summery:*
>> Can we have a metric that reports 3 values (0/1/2), to indicate status 
>> instead of using labels or adding the status to the metric name?
>>
>> *Full story:*
>> I'm working on creating a general recommendation for reporting an 
>> Kubernetes operator health metric.
>>
>> The full proposal is here , 
>> https://github.com/operator-framework/operator-sdk/pull/6315/files.
>>
>> I proposed to recommend operators to add a new health metric that would 
>> have the following naming:
>> *<operator-name-prefix>_operator_health_status *[1]
>>
>> I proposed that the values of this metric would indicate the health 
>> status:
>>   * `0` - Indicates that the operator is healthy and working as expected.
>>   * `1` - Indicates that the operator has some issues that needs to be 
>> addressed and can potentially lead to loss of functionality.
>>   * `2` - Indicates that the operator is unhealthy and there is a loss of 
>> functionality that should be addressed.
>>
>> There is a disagreement about this, since there are no examples that I 
>> could find that has a third optional value.
>> Usually these metrics are represented as Boolean (Healthy/Unhealthy) or 
>> the status is stated in the metric name.
>>
>> The reviewers believe its not recommend to have more than 2 possible 
>> values(Boolean).
>> I see few issues with this:
>> 1. The metric is sent from different operators and it would be 
>> problematic to have a label to indicate the level of health in a consistent 
>> way.
>> 2. I don't see an issue with querying Prometheus with more than 2 values. 
>> It might be more efficient than filtering with labels.
>>
>> I would appreciate you insights on this, considering that the metric is 
>> sent from multiple sources that are all developed separately. 
>>
>> Thank you,
>> Shirly Radco
>>
>> [1] I proposed a different prefix and same suffix since I know there is 
>> an issue with sending the same metric name to Prometheus with a different 
>> help text.
>> Since we can't enforce the help text to be exactly the same, the suffix 
>> should be enough to be able to display all the operators health metrics in 
>> the same panel.
>> Also, it would be easier to identify the origin of metrics that have an 
>> issue.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/db7fac2a-c58c-42eb-9797-b114e635d3b1n%40googlegroups.com.

Reply via email to