If you're looking to determine if a target is reachable or not, you could
use the "*up*" metric which is automatically added to the scrape of a given
target (see docs
<https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series>).
The alerting condition could look something like this:
*alert: TargetIsUnreachableexpr: up == 0for: 3mlabels: severity:
warningannotations: title: Instance {{ $labels.instance }} is unreachable
description: Prometheus is unable to scrape {{ $labels.instance }}. This
could indicate the target being down or at network issue.*
This will trigger the alert if the "*up*" metric is continuously equal to 0
(or in other words, the instance is unreachable) for a period of 3
minutes. The value of the "for" parameter should probably be at least 2
to 3 times higher than what your scrape_interval setting (see docs for
reference
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config>)
. It's often advised to add the "*for*" parameter to alerting conditions
to avoid noise from flapping alerts. You wouldn't want to necessarily be
notified if a single scrape fails, say due to a transient network
connectivity problem. There is also the "*absent*" function (see docs
<https://prometheus.io/docs/prometheus/latest/querying/functions/#absent>)
which you can use to determine if series (aka samples) exist for a given
metric name and label combination. You would use that in cases like where
you might want to be notified if a given metric disappears due to the
target itself disappearing from the service discovery
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config>
.
As for determining if there is an actual problem with Prometheus itself,
that can vary depending on the issue but here's a good list of known
alerting conditions that you can use to monitor the state of Prometheus
instances:
https://samber.github.io/awesome-prometheus-alerts/rules.html#prometheus-self-monitoring
On Wednesday, October 4, 2023 at 10:59:23 PM UTC-4 sri L wrote:
> Hi all,
>
> Can anyone please suggest alert expression for configuring alert rule for
> below condition.
>
> "metric data is not being received by Prometheus and to alert that there
> is an issue with the Prometheus and it is unable to scrape".
>
> Thanks
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/348b0d6c-5a3f-487c-9fec-2e8794af9372n%40googlegroups.com.