[prometheus-users] Re: Alert Query

hartfordfive Fri, 06 Oct 2023 06:44:04 -0700

If you're looking to determine if a target is reachable or not, you could 
use the "*up*" metric which is automatically added to the scrape of a given 
target (see docs 
<https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series>).
  
The alerting condition could look something like this:

*alert: TargetIsUnreachableexpr: up == 0for: 3mlabels:  severity: 
warningannotations:  title: Instance {{ $labels.instance }} is unreachable  
description: Prometheus is unable to scrape {{ $labels.instance }}. This 
could indicate the target being down or at network issue.*

This will trigger the alert if the "*up*" metric is continuously equal to 0 
(or in other words, the instance is unreachable) for a period of 3 
minutes.   The value of the "for" parameter should probably be at least 2 
to 3 times higher than what your scrape_interval setting (see docs for 
reference 
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config>)

.  It's often advised to add the "*for*" parameter to alerting conditions 
to avoid noise from flapping alerts.  You wouldn't want to necessarily be 
notified if a single scrape fails, say due to a transient network 
connectivity problem.    There is also the "*absent*" function (see docs 
<https://prometheus.io/docs/prometheus/latest/querying/functions/#absent>) 
which you can use to determine if series (aka samples) exist for a given 
metric name and label combination.   You would use that in cases like where 
you might want to be notified if a given metric disappears due to the 
target itself disappearing from the service discovery 
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config>
.

As for determining if there is an actual problem with Prometheus itself, 
that can vary depending on the issue but here's a good list of known 
alerting conditions that you can use to monitor the state of Prometheus 
instances:
https://samber.github.io/awesome-prometheus-alerts/rules.html#prometheus-self-monitoring

On Wednesday, October 4, 2023 at 10:59:23 PM UTC-4 sri L wrote:

> Hi all,
>
> Can anyone please suggest alert expression for configuring alert rule for 
> below condition.
>
> "metric data is not being received by Prometheus and to alert that there 
> is an issue with the Prometheus and it is unable to scrape".
>
> Thanks
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/348b0d6c-5a3f-487c-9fec-2e8794af9372n%40googlegroups.com.

[prometheus-users] Re: Alert Query

Reply via email to