P.S. Your expression
> expr: 100 -
((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}
* 100) /
node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"})
>= 85
can be simplified to:
> expr: 100 -
((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}
* 100) / node_filesystem_size_bytes) >= 85
That's because the result instant vector for an expression like "foo / bar"
only includes entries where the label sets match on left and right hand
sides. Any others are dropped silently. (This form may be slightly less
efficient, but I wouldn't expect it to be a problem unless you have
hundreds of thousands of filesystems)
I would be inclined to simplify it further to:
> expr:
node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}
/ node_filesystem_size_bytes < 0.15
You can use {{ $value | humanizePercentage }} in your alert annotations to
show readable percentages.
On Wednesday, 26 April 2023 at 08:14:35 UTC+1 Brian Candler wrote:
> > I guess with (2) you also meant having a route which is then permanently
> muted?
>
> I'd use a route with a null receiver (i.e. a receiver which has no
> <transport>_configs under it)
>
> > b) The idea hat I had above:
> > - using <alert_relabel_configs> to filter on the instances and add a
> label if it should be silenced
> > - use only that label in the expr instead of the full regex
> > But would that even work?
>
> No, because as far as I know alert_relabel_configs is done *after* the
> alert is generated from the alerting rule. It's only used to add extra
> labels before sending the generated alert to alertmanager. (It occurs to me
> that it *might* be possible to use 'drop' rules here to discard alerts;
> that would be a very confusing config IMO)
>
> > For me it's really like this:
> > My Prometheus instance monitors:
> > - my "own" instances, where I need to react on things like >85% usage on
> root filesystem (and thus want to get an alert)
> > - "foreign" instances, where I just get the node exporter data and show
> e.g. CPU usage, IO usage, and so on as a convenience to users of our
> cluster - but any alert conditions wouldn't cause any further action on my
> side (and the guys in charge of those servers have their own monitoring)
>
> In this situation, and if you are using static_configs or file_sd_configs
> to identify the hosts, then I would simply use a target label (e.g.
> "owner") to distinguish which targets are yours and which are foreign; or I
> would use two different scrape jobs for self and foreign (which means the
> "job" label can be used to distinguish them)
>
> The storage cost of having extra labels in the TSDB is essentially zero,
> because it's the unique combination of labels that identifies the
> timeseries - the bag of labels is mapped to an integer ID I believe. So
> the only problem is if this label changes often, and to me it sounds like a
> 'local' or 'foreign' instance remains this way indefinitely.
>
> If you really want to keep these labels out of the metrics, then having a
> separate timeseries with metadata for each instance is the next-best
> option. Suppose you have a bunch of metrics with an 'instance' label, e.g.
>
> node_filesystem_free_bytes(instance="bar", ....}
> node_filesystem_size_bytes(instance="bar", ....}
> ...
>
> as the actual metrics you're monitoring, then you create one extra static
> timeseries per host (instance) like this:
>
> meta{instance="bar",owner="self",site="london"} 1
>
> (aside: TSDB storage for this will be almost zero, because of the
> delta-encoding used). These can be created by scraping a static webserver,
> or by using recording rules.
>
> Then your alerting rules can be like this:
>
> expr: |
> (
> ... normal rule here ...
> ) * on(instance) group_left(site) meta{owner="self"}
>
> The join will:
> * Limit alerting to those hosts which have a corresponding 'meta'
> timeseries (matched on 'instance') and which has label owner="self"
> * Add the "site" label to the generated alerts
>
> Beware that:
>
> 1. this will suppress alerts for any host which does not have a
> corresponding 'meta' timeseries. It's possible to work around this to
> default to sending rather than not sending alerts, but makes the
> expressions more complex:
> https://www.robustperception.io/left-joins-in-promql
>
> 2. the "instance" labels must match exactly. So for example, if you're
> currently scraping with the default label instance="foo:9100" then you'll
> need to change this to instance="foo" (which is good practice anyway). See
> https://www.robustperception.io/controlling-the-instance-label
>
> (I use some relabel_configs tricks for this; examples posted in this group
> previously)
>
> > From all that it seems to me that the "best" solution is either:
> > a) simply making more complex and error prone alert rules, that filter
> out the instances in the first place, like in:
> > expr: 100 -
> ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}
>
> * 100) /
> node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"})
>
> >= 85
>
> That's not great, because as you observe it will become more and more
> complex over time; and in any case won't work if you want to treat certain
> combinations of labels differently (e.g. stop alerting on a specific
> *filesystem* on a specific host)
>
> If you really don't want to use either of the solutions I've given above,
> then another way is to write some code to preprocess your alerting rules,
> i.e. expand a single template rule into a bunch of separate rules, based on
> your own templates and data sources.
>
> HTH,
>
> Brian.
>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/ae3aa23e-a67d-41a2-a3c6-805487ec817cn%40googlegroups.com.