Good afternoon!
I have a problem.I want to display the remaining memory in alerts, which
takes the value of the node_filesystem_avail_bytes metric and divides it by
/ 1024 / 1024 (to convert to Mb).
Now my rule looks like this:
- alert: HostOutOfDiskSpace_test
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes
> 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 and
{instance=~"10.22.22.18.*"}
for: 1s
labels:
severity: high
annotations:
summary: "Host {{ $labels.instance }} out of memory \n - Device: {{
$labels.device }} \n - Mountpoint - {{ $labels.mountpoint }}"
description: "Node memory is filling up. Value = {{ $value | printf
`%.2f` }} ({{ printf \"node_filesystem_avail_bytes{mountpoint='%s'}\"
.Labels.mountpoint | query | first | value|humanize1024}})"
But the value is displayed incorrectly. Prometheus shows the correct values
of the node_filesystem_avail_bytes metrics, but in the notification it is
not recalculated correctly. Here are examples of notifications from
telegrams:
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: /dev/sda1
- Mountpoint - /boot
Description: Node memory is filling up. Value = 72.62 (736.4Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: tmpfs
- Mountpoint - /run
Description: Node memory is filling up. Value = 98.14 (298.2Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: tmpfs
- Mountpoint - /run/user/1000
Description: Node memory is filling up. Value = 100.00 (298.8Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: /dev/mapper/rhel-root
- Mountpoint - /
Description: Node memory is filling up. Value = 87.08 (31.48Gi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
At the same time, Prometheus shows the following values:
{device="/dev/mapper/rhel-root", fstype="xfs",
instance="10.22.22.181:9100", mountpoint="/"} 15146.296875
{device="/dev/sda1", fstype="xfs", instance="10.22.22.181:9100",
mountpoint="/boot"} 736.39453125
{device="tmpfs", fstype="tmpfs", instance="10.22.22.181:9100",
mountpoint="/run"} 872.640625
{device="tmpfs", fstype="tmpfs", instance="10.22.22.181:9100",
mountpoint="/run/user/1000"} 177.83984375
Here is the output from the OS:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 871M 0 871M 0% /dev
tmpfs 890M 0 890M 0% /dev/shm
tmpfs 890M 17M 873M 2% /run
tmpfs 890M 0 890M 0% /sys/fs/cgroup
/dev/mapper/rhel-root 17G 2.2G 15G 13% /
/dev/sda1 1014M 278M 737M 28% /boot
tmpfs 178M 0 178M 0% /run/user/1000
Reading the official documentation from the prometheus.io website, I
realized that the problem was in the translation of the data. I need one
rule to work for different devices and mount points. The construct {{
printf \"node_filesystem_avail_bytes{mountpoint='%s'}\" .Labels.mountpoint
| query | first | value | humanize1024 }} works the way I want it to, but
the mountpoint string that contains the given value doesn't translate
correctly to humanize1024. humanize Very far from real values, so I don't
consider it.
Maybe someone has come across this. How can I display
node_filesystem_avail_bytes associated with a specific device and mount
point, bypassing the humanize1024 function, but simply dividing by / 1024 /
1024 or some other conversion to MB or GB?
Thank you for your responses!
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/a0ba6ac0-6f84-429d-9a14-b5934e3a59b7n%40googlegroups.com.