[prometheus-users] Re: silence wasn't applied

2025-01-16 Thread 'Brian Candler' via Prometheus Users
I don't know, but you could look at the source code of projects like https://github.com/prymitive/karma and alerta.io to see how they interface with the API. Also: https://stackoverflow.com/questions/75283114/prometheus-alertmanager-api-to-manage-silences On Thursday, 16 January 2025 at 11:50:

[prometheus-users] Re: silence wasn't applied

2025-01-13 Thread 'Brian Candler' via Prometheus Users
> from a look in the AlertManager UI no silence was created, and i got resolved notification after 5 minutes since the fired notification. ... > I wonder why the silence wasn't able to create? (not the first time it happens) > Maybe it's some kind of a race condition? we can't silence alerts whi

[prometheus-users] Re: Alert manager receiver config.

2025-01-07 Thread 'Brian Candler' via Prometheus Users
You should remove the two sections that I highlighted. The following is a valid config, and is accepted: 8< global: smtp_hello: 'example.com' route: group_wait: 1m group_interval: 5m repeat_interval: 12h group_by: ["alertname", "severity"] # Default receiver (for all alert

[prometheus-users] Re: Alert manager receiver config.

2025-01-07 Thread &#x27;Brian Candler&#x27; via Prometheus Users
There are two major problems with that confgi: - there is a "routes" block nested under "inhibit_rules", where it is not allowed - there is a "matchers" block nested under "receivers", where it is not allowed And if I try to start alertmanager 0.25.0 (the version you say you're using) with that

[prometheus-users] Re: Alert manager receiver config.

2025-01-06 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Then something strange is going on. Are you 100% sure that you pasted *exactly* the configuration file you're using, with *exactly* the correct contents and indentation? You can add labels in your service discovery, e.g. if you are using file_sd_configs then inside the targets file you can put

[prometheus-users] Re: Alert manager receiver config.

2025-01-03 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I have retested this with alertmanager 0.25.0, and it still rejects the config file you supplied as invalid, refusing to start. Therefore, I think you're *not* running with the configuration file you think you are. This is the first issue to resolve. You can prove this to yourself by running a

[prometheus-users] Re: Alert manager receiver config.

2025-01-02 Thread &#x27;Brian Candler&#x27; via Prometheus Users
What version of alertmanager are you using? Your configuration is invalid, so I don't know how you're even getting alertmanager to start. You have a top-level "routes:" block (before "inhibit_rules"), and you have another "routes:" block nested under inhibit_rules, but neither of these are allo

[prometheus-users] Re: Is there a way to get resolved alerts via the API?

2025-01-02 Thread &#x27;Brian Candler&#x27; via Prometheus Users
AFAIK, alertmanager simply notices that the alert has disappeared, and sends a notification when that happens. That is, it retains some internal state about previously firing alerts (which it needs to for other reasons too, e.g. repeat_interval) and compares current to previous. But once it's g

[prometheus-users] Re: Alert manager config

2024-12-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Please repost the config with correct formatting to make valid YAML: that is, with the correct horizontal indentation (which is critical), without the line numbers, and not in colour (the colons and brackets are white-on-white, making them invisible) On Monday, 30 December 2024 at 19:32:59 UTC

[prometheus-users] Re: Prometheus Disk usage (BlackBox Exporter + AlertManager)

2024-12-28 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Go into the Prometheus web interface (PromQL query editor), type "windows_logical_disk_free_bytes", and look at the vector of results you get. I don't use Windows, but I'm guessing you'll see something like: windows_logical_disk_free_bytes{instance="server1", filesystem="c:"} 12345 windows_logi

[prometheus-users] Re: Is Prometheus and PromQL suitable for working on a metric that doesn't change much?

2024-12-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
You wouldn't even need the "sum" operator, since that sums across multiple timeseries. There would be a single import_process_total timeseries. Queries might be: import_process_total - import_process_total offset 1h rate(import_process_total[1h]) ... etc On Friday, 20 December 2024 at 13:03:04

[prometheus-users] Re: Is Prometheus and PromQL suitable for working on a metric that doesn't change much?

2024-12-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I see at least two distinct issues there. 1. "Is Prometheus and PromQL suitable for working on a metric that doesn't change much?" - quite simply, "yes". Prometheus uses delta compression, so adjacent identical values compress extremely well. Indeed, Prometheus is often used for metrics which

[prometheus-users] Re: Alert manager sending alerts, if one poll exists inside the 5m period.

2024-12-19 Thread &#x27;Brian Candler&#x27; via Prometheus Users
As a starting point, put the expression "probe_icmp_duration_seconds == 0" into the PromQL web browser in Prometheus, and zoom into the expected time area. What do you see? One possible issue is if the timeseries appears and disappears; 5 minutes happens to be the default staleness interval (lo

[prometheus-users] Re: Alertmanager email issues

2024-12-19 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> I am running kube-prometheus on K8s cluster - installed with helm. Software versions? > When I check the UI Which UI are you referring to? Prometheus' web UI has an "alerts" section, and Alertmanager's web UI also has an "alerts" section. Do they both show the active PodErrorAlert? Check th

[prometheus-users] Re: Advice on where/how to write a new niche-ish blackbox exporter probe?

2024-12-14 Thread &#x27;Brian Candler&#x27; via Prometheus Users
If you want to minimize your work, you can write a test as a one-shot standalone program in any language of your choice, and either: 1. Run it from cron, write the results to a file, and pick them up by node_exporter textfile collector; OR 2. Run it on demand from exporter_exporter

[prometheus-users] Re: Help needed for using Node Exporter for parsing CSV file

2024-12-13 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> do we have any existing node exporter which can read csv file? No. But if you write a simple script to read the csv file and write it back out in openmetrics format, then you can use node_exporter's textfile collector. For example: total_bytes{interface_name="abc"} 600 incoming_bytes{interfac

[prometheus-users] Re: Offset alert never clearing

2024-12-13 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> I do not really understand how expr works in prom rules - is it something that simply evaluates to either 1 or 'true' as a go bool type? No. It's not boolean logic at all. PromQL works with *vectors*: a vector contains zero or more values, each with a distinct set of labels. An alert fires wh

[prometheus-users] Re: Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible.

2024-12-06 Thread &#x27;Brian Candler&#x27; via Prometheus Users
A bit more context would be helpful. Did you write: matchers: - applicationid=~"^SNSVC\d{7}$" or something else? Testing with alertmanager 0.27, I think the problem is around handling of the backslash. The following is accepted by amtool check-config: matchers: - applicationid=~"^SNSVC\\d{

[prometheus-users] Re: Alertmanager is moving to a new parser for labels and matchers, and this input is incompatible.

2024-12-06 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I checked at https://prometheus.io/docs/alerting/latest/configuration/#matcher AFAICS, it doesn't explicitly mention the behaviour of backslashes in UTF-8 matches. There is a specific note about the fallback behaviour of classic matchers

[prometheus-users] Re: When computing, how to handle null values in PromQL.

2024-12-06 Thread &#x27;Brian Candler&#x27; via Prometheus Users
"(A + B) or A or B" is correct. Remember that "A + B" is not a scalar arithmetic expression. It's a vector expression: - for every element in vector A and B which have exactly matching label sets (apart from __name__), calculate the sum and put it in the result vector with the same set of label

[prometheus-users] Re: Drop labels of a metric not entire job

2024-11-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Checking in the source code: if c.Action == LabelDrop || c.Action == LabelKeep { if c.SourceLabels != nil || c.TargetLabel != DefaultRelabelConfig.TargetLabel || c.Modulus != DefaultRelabelConfig.Modulus ||

[prometheus-users] Re: Thanos Query APIs are returning different values, depending upon the selection of start time and end time.

2024-11-12 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Can you show what exactly is different in the output between the two cases, that you don't expect to be different? On Tuesday 12 November 2024 at 17:13:26 UTC Kishore Kumar wrote: > Hello Prometheus Community, > I hope the person reading this has an awesome day, and thanks > for helpi

Re: [prometheus-users] Re: Can any one help me with the alertmanager.yml with proxy ?

2024-11-06 Thread &#x27;Brian Candler&#x27; via Prometheus Users
o you have > any suggestion for this scenario than changing the mail server itself ? > > Thanks > > On Tue, 5 Nov 2024 at 6:13 PM, 'Brian Candler' via Prometheus Users < > promethe...@googlegroups.com> wrote: > >> What makes you think the problem would be fixe

[prometheus-users] Re: Can any one help me with the alertmanager.yml with proxy ?

2024-11-05 Thread &#x27;Brian Candler&#x27; via Prometheus Users
What makes you think the problem would be fixed by a proxy - what *exact* error message are you seeing? And what sort of proxy are you thinking of? If it's an SMTP proxy then you just point alertmanager at it for sending mails. If it's a SOCKS5 proxy then that functionality is not built-in to a

Re: [prometheus-users] Re: Customizing Alertmanager Notifications for Telegram

2024-10-24 Thread &#x27;Brian Candler&#x27; via Prometheus Users
On Thursday 24 October 2024 at 16:13:06 UTC+1 Chris Siebenmann wrote: As a counterpoint: we send resolved alerts so that we can know when a problem stopped as well as when it started (which helps for diagnosis) Fair enough, although I will mention that the historical alert information is also

[prometheus-users] Re: Issue with resolved alerts not sending notifications

2024-10-24 Thread &#x27;Brian Candler&#x27; via Prometheus Users
On Wednesday 23 October 2024 at 16:26:30 UTC+1 mohammad md wrote: - - *annotations: summary: "High CPU usage on {{ $labels.Host }} for {{ $labels.Client }} ({{ $value }})" description: "CPU usage on {{ $labels.Host }} for {{ $labels.Client }} has exceeded 70% for 5 minu

[prometheus-users] Re: Customizing Alertmanager Notifications for Telegram

2024-10-24 Thread &#x27;Brian Candler&#x27; via Prometheus Users
On Wednesday 23 October 2024 at 16:26:30 UTC+1 bashar madani wrote: The issue I’m facing is that Alertmanager keeps repeating the FIRING message even after the issue is resolved. I want to ensure that only the RESOLVED message is sent when the problem is fixed. If you have a group of alerts, a

[prometheus-users] Re: WAL configuration to extend retention window

2024-10-18 Thread &#x27;Brian Candler&#x27; via Prometheus Users
You could have a look at using Grafana Alloy to buffer prometheus remote_write: https://grafana.com/docs/alloy/latest/reference/components/prometheus/prometheus.receive_http/#example It has its own WAL with adjustable retention: https://grafana.com/docs/alloy/latest/reference/components/prometheu

[prometheus-users] Re: WAL configuration to extend retention window

2024-10-18 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I'm not aware that you can. The issue regarding the limitation is still open: https://github.com/prometheus/prometheus/issues/9607 which is linked from here: https://prometheus.io/blog/2021/11/16/agent/ "This is currently limited to a two-hour buffer only, similar to non-agent Prometheus, hopefu

[prometheus-users] Re: metric relabeling of a label to __name__ does not work

2024-10-14 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Ah yes, I hadn't thought of that. The default for regex is "(.*)" not "(.+)" so it will match empty string. Glad it's working for you now! On Monday 14 October 2024 at 12:21:02 UTC+1 M shr wrote: > yes

[prometheus-users] Re: metric relabeling of a label to __name__ does not work

2024-10-14 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Have a look at the "Status > Targets" menu of the prometheus web interface. It may tell you of scraping errors. For example, I think it's very likely with that renaming rule that you could end up with duplicate metrics after the renaming, and that will cause scrapes to fail (and therefore drop t

Re: [prometheus-users] Re: CAN I CONFIGURE TWO EMAIL RECIEVER FOR ALERTMANAGER

2024-10-11 Thread &#x27;Brian Candler&#x27; via Prometheus Users
d_email > reciever configured for two email addresses? > > On Thu, 10 Oct 2024 at 13:32, 'Brian Candler' via Prometheus Users < > promethe...@googlegroups.com> wrote: > >> route: >> receiver: send_email >> routes: >> - receiver: send_email2 >

Re: [prometheus-users] Re: CAN I CONFIGURE TWO EMAIL RECIEVER FOR ALERTMANAGER

2024-10-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
route: receiver: send_email routes: - receiver: send_email2 That route will only ever send to send_mail2. Why? "routes" are child routes of this route. When processing a given routing rule, alertmanager scans through all the child routes in turn, and the first one which matches is used.

[prometheus-users] Re: CAN I CONFIGURE TWO EMAIL RECIEVER FOR ALERTMANAGER

2024-10-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Please show the configuration you have tried, to make it clearer what you're trying to do, and then we can help you correct it. email_configs takes a YAML list of email_config entries (see docs ), so a single receiver can have

Re: [prometheus-users] Re: ALERTMANAGER NOT RUNNING

2024-10-08 Thread &#x27;Brian Candler&#x27; via Prometheus Users
th no errors. > > On Tue, 10 Sept 2024 at 20:38, 'Brian Candler' via Prometheus Users < > promethe...@googlegroups.com> wrote: > >> > Yes, but i do not know why when trying to start Alertmanager it tells >> me the port is already in use and can’t start. >

[prometheus-users] Re: K6_http_req_duration_$quantile_stat Metrics are the Same Across Quantiles for Certain APIs

2024-10-04 Thread &#x27;Brian Candler&#x27; via Prometheus Users
On Friday 4 October 2024 at 01:59:29 UTC+1 Zhang Zhao wrote: When running a specific test case and switching the trend metric query to different quantile values in Grafana, the panels don't update properly. I think you should first remove Grafana from the equation entirely. If the problem is s

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
First, make sure you can reproduce the existing generator.yml -> snmp.yml using the Makefile to download MIBs. You should be able to reproduce exactly what the supplied snmp.yml has. Once you have that working, then try your own generator.yml and compare the generated snmp.yml. Drilling down th

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I mean the MIB files consumed by generator. On Monday 30 September 2024 at 14:41:14 UTC+1 Mitchell Laframboise wrote: > I know the mib is working because when I do an snmpwalk i get the > following output. > > ~/snmp_exporter/generator$ snmpwalk -v2c -c public *.*.*.* > 1.3.6.1.4.1.6574.3 > SNM

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> Since the generator.yml has that metric in overrides, shouldn't it be generated? No. Overrides only change how a metric is rendered; if there's no matching metric in the MIB then there's nothing to override. On Monday 30 September 2024 at 13:45:57 UTC+1 Brian Candler wrote: > > I looked at t

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> I looked at the sample snmp.yml from Github that I assume is generated from the default generator.yml and I see that the "raidTotalSize" metric is included, but when I check my snmp.yml that metric isn't included. Either something is different in your generator.yml, or something is different

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I can't see what you're looking at, because: 1. You've shown your generator.yml, but you've not shown the snmp.yml output that generator creates. 2. You've not said how the output snmp.yml is different from the supplied snmp.yml 3. You've not said what version of snmp_exporter you're using, so I

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-29 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> I am successful in querying the metrics in Prometheus Which ones in particular *are* you able to see? > I did some more queries and found that I'm unable to return ifName? Please explain exactly what you're doing when you say "unable to return". If you're going to the Prometheus web interfac

[prometheus-users] Re: Synology NAS Details Dashboard

2024-09-29 Thread &#x27;Brian Candler&#x27; via Prometheus Users
First, do a query in the Prometheus web interface (for example, just "ifPhysAddress"). If you see no answers, then you need to drill down into your metrics collection. Check the query "up" to see if SNMP scraping is successful. If it's not, then check logs from snmp_exporter ("journalctl -eu sn

[prometheus-users] Re: promql stat functions return identical values

2024-09-27 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> e.g. Grafana can quite happily render 0-1 as 0-100% and in alerting rules: - expr: blah > 0.9 annotations: summary: 'filesystem usage is high: {{ $value | humanizePercentage }}' -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To un

[prometheus-users] Re: promql stat functions return identical values

2024-09-27 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Perform the two halves of the query separately, i.e. max_over_time(node_filesystem_avail_bytes{...}[1h] max_over_time(node_filesystem_size_bytes{...}[1h] and then you'll see why they divide to give 48% instead of 97% I expect node_filesystem_size_bytes doesn't change much, so max_over_time doesn

[prometheus-users] Re: promql stat functions return identical values

2024-09-24 Thread &#x27;Brian Candler&#x27; via Prometheus Users
$__rate_interval is (roughly speaking) the interval between 2 adjacent points in the graph, with a minimum of 4 times the configured scrape interval. It's not the entire period over which Grafana is drawing the graph. You probably want $__range or $__range_s. See: https://grafana.com/docs/grafan

Re: [prometheus-users] Re: TLS CONFIGURATION

2024-09-15 Thread &#x27;Brian Candler&#x27; via Prometheus Users
severity: warning > equal: > - alertname > - dev > - instance > > The error > ts=2024-09-15T17:58:49.480Z caller=coordinator.go:118 level=error > component=configuration msg="Loadion file failed" > file=/etc/alertmanager/alertmanager.yml err=

[prometheus-users] Re: TLS CONFIGURATION

2024-09-15 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Show what you did, and what the error was, and then maybe we can help you. There are some global settings that cover common use cases: https://prometheus.io/docs/alerting/latest/configuration/#file-layout-and-global-settings However, if you need more control (e.g. for client certificate auth or

[prometheus-users] Re: Synology SNMP

2024-09-11 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> The job is running so the Dashboard must be broken Quite possibly (many of them area). I suggest you don't go to the Targets screen and click on the scrape URL; that will make a scrape from your browser. Rather, go to the PromQL interface (the main page) and enter some PromQL queries like i

Re: [prometheus-users] Re: ALERTMANAGER NOT RUNNING

2024-09-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
e and saw an http2 error, > meaning Alertmanager is till using its default port. I also tried accessing > Alertmanager via my web interface with the new port no attached, nothing > showed up > > On Tue, 10 Sep 2024 at 19:34, 'Brian Candler' via Prometheus Users < >

[prometheus-users] Re: Synology SNMP

2024-09-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
So when you said "im getting metrics returned", which metrics were you talking about? > the job name specified in the .yml isn't even showing up. In which yml - the Grafana dashboard, the Prometheus scrape config, something else? On Tuesday 10 September 2024 at 20:00:26 UTC+1 Mitchell Laframbo

[prometheus-users] Re: Synology SNMP

2024-09-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Looking at the source of that dashboard, all the queries are filtered against {job=~'$JobName',instance=~'$Device' and the way the JobName values are chosen is from this query: "name": "JobName", "options": [], "query": { "query": "label_values(ssCpuUser, job)",

[prometheus-users] Re: Synology SNMP

2024-09-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
If you're "getting metrics returned" then either the dashboard is broken, or the metrics you're collecting are not the same as the ones the dashboard is expecting, or the dashboard has some hard-coded assumptions that don't match your environment (e.g. the queries are hard-coded to expect parti

[prometheus-users] Re: Synology SNMP

2024-09-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Which dashboard? Did you write it yourself, or find one published? There aren't many snmp_exporter dashboards on the grafana hub, but I see several that claim to be for Synology. I made a couple for simple ones for generic if_mib (interface stats): https://grafana.com/grafana/dashboards/12492-sn

[prometheus-users] Re: ALERTMANAGER NOT RUNNING

2024-09-10 Thread &#x27;Brian Candler&#x27; via Prometheus Users
alertmanager listens on two ports. By default: --web.listen-address=:9093 --cluster.listen-address=0.0.0.0:9094 On Tuesday 10 September 2024 at 15:25:31 UTC+1 Chinelo Ufondu wrote: > Hello > i have tried again by running this command like you suggested and > specifying a port that the clusters

Re: [prometheus-users] SNMP EXPORTER GENERATOR ERRORS

2024-09-09 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I strongly advise you to use node_exporter rather than snmpd for collecting metrics from Linux hosts, unless there's something you can't get any other way (keepalived VRRP might be one example). By "snmpd.conf ... access control" you might be asking how to create SNMPv2 communities and SNMPv3 u

Re: [prometheus-users] SNMP EXPORTER GENERATOR ERRORS

2024-09-09 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> msg="Loading MIBs" from=$HOME/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf If you do that, then it's your responsibility to download the mibs you need. "apt-get snmp-mibs-downloader" on Ubuntu/Debian will get a bunch, but I don't know if you'll have all

[prometheus-users] Re: PromQL redirection

2024-09-06 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Have a look at https://github.com/jacksontj/promxy But I don't think it's yet clever enough to avoid querying servers that couldn't possibly match the query. (Presumably it could only do that if your PromQL was specific enough with its labels) On Friday 6 September 2024 at 16:28:10 UTC+1 Samit

[prometheus-users] Re: feed large csv file into the Prometheus

2024-09-05 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> Hi Brian, want to import 10GB csv file into the Prometheus, after that try to run different queries to find out how it performs with data with high cardinality. In prometheus, the timeseries data consists of float values and there's no "cardinality" as such. But each timeseries is determined

[prometheus-users] Re: feed large csv file into the Prometheus

2024-09-05 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Prometheus is very specific to timeseries data, and normally new data is ingested as of the current time. If you have previous timeseries data that you need to import as a one-time activity, then there is "backfilling", see https://prometheus.io/docs/prometheus/latest/storage/#backfilling-from-o

[prometheus-users] Re: max_over_time not working as expected - want to get the 3 most recent values higher than a specific threshold

2024-09-03 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Oops, topk(3, max_over_time(foo[31d] @ 1725145200) ) On Tuesday 3 September 2024 at 09:41:23 UTC+1 Brian Candler wrote: > > If I run "max_over_time{}[1m:15s]" it will show me the peak of every 1m > evaluating every 15s sample. That's ok. > > That expression is almost certainly wrong; it is

[prometheus-users] Re: max_over_time not working as expected - want to get the 3 most recent values higher than a specific threshold

2024-09-03 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> If I run "max_over_time{}[1m:15s]" it will show me the peak of every 1m evaluating every 15s sample. That's ok. That expression is almost certainly wrong; it is querying a metric called "max_over_time" (which probably doesn't exist), rather than calling the function max_over_time(...) on an e

[prometheus-users] Re: Drop unsed metrics

2024-09-03 Thread &#x27;Brian Candler&#x27; via Prometheus Users
That looks OK to me. I think it should drop all metrics with name "container_processes". Can you show a wider context of the entire scrape job config? metric_relabel_configs is configured under a specific scrape job. and only applies to metrics collected by that scrape job. There are simple th

[prometheus-users] Re: promehtus.yml autogenerated

2024-09-01 Thread &#x27;Brian Candler&#x27; via Prometheus Users
That is not part of prometheus. Where did /opt/prom-registry/scripts/update.php come from? That's what's doing it. But I can't find any "prom-registry" PHP code on github or via a google search. Perhaps you obtained Prometheus as part of a third-party application or bundle? On Sunday 1 Septem

Re: [prometheus-users] I am having an error while checking the alert manager status

2024-09-01 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Show your config file? On Sunday 1 September 2024 at 18:45:19 UTC+1 Chinelo Ufondu wrote: > I have been able to resolve the issue, the problem was my config file, it > wasn't properly indented and it had some syntax errors > > I tried running alertmanager again and i came across another issue, h

[prometheus-users] Re: max_over_time not working as expected - want to get the 3 most recent values higher than a specific threshold

2024-08-31 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Why are you doing a subquery there? max_over_time(metric[1h]) should give you the largest value at any time over that 1h period. The range vector includes all the points in that time period, without resampling. A subquery could be used if you needed to take an instant vector expression and tur

[prometheus-users] Re: I am having an error while checking the alert manager status

2024-08-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
You're not taking a very helpful approach to debugging. If you had shown the full error message, as I suggested, then your issue could probably have been fixed. You didn't say what guide you're following. The official documentation is IMO clear and detailed: https://prometheus.io/docs/alerting/

[prometheus-users] Re: I am having an error while checking the alert manager status

2024-08-28 Thread &#x27;Brian Candler&#x27; via Prometheus Users
The important error message has been truncated ("Unable to create..."). You can use left/right arrows to scroll sideways, but it would be better to use these commands: systemctl status alertmanager -l --no-pager journalctl -u alertmanager -n100 --no-pager On Wednesday 28 August 2024 at 18:28:15

[prometheus-users] Re: Changing Port Number and Implementing Authentication for Windows Exporter

2024-08-28 Thread &#x27;Brian Candler&#x27; via Prometheus Users
You need to pass some flags to the server process: https://github.com/prometheus-community/windows_exporter?tab=readme-ov-file#flags --web.listen-address lets you change the port --web.config.file lets you point to a config file for setting up HTTP basic auth and/or TLS client certificate auth O

Re: [prometheus-users] Re: My Query never fires an Alarm

2024-08-24 Thread &#x27;Brian Candler&#x27; via Prometheus Users
or This alert > rule). I see Alarms for UP. Also, Labels for ALL rules are same! Critical. > > On Fri, Aug 23, 2024 at 2:38 AM 'Brian Candler' via Prometheus Users < > promethe...@googlegroups.com> wrote: > >> > 2. I am getting other Alerts through Alert

Re: [prometheus-users] Re: My Query never fires an Alarm

2024-08-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
ry interface but still it > doesn't fire. > > J > > On Thu, Aug 22, 2024 at 10:20 AM 'Brian Candler' via Prometheus Users < > promethe...@googlegroups.com> wrote: > >> Your test example in PromQL browser has: >> confluent_kafka_server_consumer

Re: [prometheus-users] Re: My Query never fires an Alarm

2024-08-22 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Your test example in PromQL browser has: confluent_kafka_server_consumer_lag_offsets{job="confluent-cloud"} > 1 and the values were 2 or 3; but the alerting expression has confluent_kafka_server_consumer_lag_offsets{job="confluent-cloud"} > 100 So clearly it's not going to trigger under that condi

[prometheus-users] Oddity with v0.xxx tags

2024-08-19 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I have just noticed a load of tags in the prometheus repo for v0.XXX (from v0.35.0 to v0.54.0 inclusiveq) which match with v2.XXX For example: https://github.com/prometheus/prometheus/tree/v0.54.0 https://github.com/prometheus/prometheus/releases/v0.54.0 and which github claims was only tagged l

[prometheus-users] Re: Questions about best way to monitor CPU usage and accuracy and container life-time

2024-08-19 Thread &#x27;Brian Candler&#x27; via Prometheus Users
The @ modifier is no longer experimental. It was made a permanent part of Prometheus in Jan 2022, when the promql-at-modifier was made a no-op. It is no longer listed under feature flags . See commit b39f2739e5b01560ad8299d2579f1041a0

[prometheus-users] Re: Suggest any exporter which exports results of KQL query from azure resource to promethues

2024-08-01 Thread &#x27;Brian Candler&#x27; via Prometheus Users
At worst, you can use a cronjob script to perform your KQL query periodically, write its results to a file in prometheus text-based exposition format , then pick it up using node_exporter textfile collector (or eve

[prometheus-users] Re: Alertmananger keep send blank alert and how create resolve template

2024-07-31 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> So can I turn that blank alert off or I missing any config? You haven't shown any of your config, so it's impossible to comment on it. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving email

[prometheus-users] Re: [Relabel} For specific metric, persist only metrics coming from particular namespace and ignore rest

2024-07-30 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Use a temporary label to give the logic "drop if metric name is X and namespace is not Y" Roughly like this (untested): - source_labels: [namespace] regex: 'my_interesting_namespace' target_label: __tmp_keep_namespace replacement: '1' - source_labels: [__name__, __tmp_keep_namespace] re

[prometheus-users] Re: SNMP.yml configuration

2024-07-29 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I suggest you use a text editor. But you shouldn't create snmp.yml manually. You should create generator.yml and then use generator + MIB files to convert it to snmp.yml. The format of generator.yml is documented here ,

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

2024-07-27 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Q1 - yes, each route can have separate group_by section, as shown in the documentation: https://prometheus.io/docs/alerting/latest/configuration/#route-related-settings Note that if you do *group_by: [instance]* then you'll get one Opsgenie alert group for an instance, even if there are multiple

Re: [prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
gt; * An SNMP context is a collection of management information accessible >>>> by an SNMP entity. An item of management information may exist in more >>>> than >>>> one context and an SNMP entity potentially has access to many contexts >>>> [RFC3411 <

Re: [prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
https://datatracker.ietf.org/doc/html/rfc3411>]. A context is >> identified by the snmpEngineID value of the entity hosting the management >> information (also called a contextEngineID) and a context name that >> identifies the specific context (also called a contextName).* >>

[prometheus-users] Re: node exporter's data collection frequency

2024-07-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> does node exporter use the same method to collect the file system usage stats which the df command uses? Essentially yes. But you can easily exclude certain filesystems and/or types of filesystem from col

Re: [prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
rameters. > > This can be included in the auth section of the config[0], or as a URL > parameter in the latest release[1]. > > [0]: > https://github.com/prometheus/snmp_exporter/tree/main/generator#file-format > [1]: https://github.com/prometheus/snmp_exporter/pull/116

[prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> The Cisco switches I am using require you to specify the VLAN context to retrieve the data I'm not sure I follow. Clearly, you "retrieve" the data simply by walking the relevant SNMP MIB, for which you need to specify nothing more than the OID to walk. Are you saying that Cisco have a proprie

Re: [prometheus-users] Counter or Gauge metric?

2024-07-21 Thread &#x27;Brian Candler&#x27; via Prometheus Users
On Sunday 21 July 2024 at 00:51:48 UTC+1 Christoph Anton Mitterer wrote: Hey. On Sat, 2024-07-20 at 10:26 -0700, 'Brian Candler' via Prometheus Users wrote: > > If the label stays constant, then the amount of extra space required > is tiny. There is an internal mappi

Re: [prometheus-users] Counter or Gauge metric?

2024-07-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> If one adds a label to a metric, which then stays mostly constant, does > this add any considerably amount of space needed for storing it? If the label stays constant, then the amount of extra space required is tiny. There is an internal mapping between a bag of labels and a timeseries ID. B

[prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I had a play with this and I think I got most of the way there. Here's generator.yml: modules: bridge_mib: walk: - dot1dBasePortTable - dot1dTpFdbTable lookups: - source_indexes: [dot1dTpFdbAddress] lookup: dot1dTpFdbPort - source_indexes: [dot1dTpFdbPort

[prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I found a relevant issue: https://github.com/prometheus/snmp_exporter/issues/405 Firstly, the PromQL count_values operator can be used to convert a metric value to a label (very neat trick). And secondly,

[prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> dot1dBasePortIfIndex{dot1dBasePort="12"} 12 - *This won't always be the same number* The MIB help text says "The value of the instance of the ifIndex object". So I'm guessing that what you currently get as dot1dBasePortIfIndex{dot1dBasePort="12"} 42 would be more usefully returned as

[prometheus-users] Re: SNMP Exporter - Gathering MAC and IP per port

2024-07-18 Thread &#x27;Brian Candler&#x27; via Prometheus Users
> The challenge I am having is using promql to join the data so I can show the IP associated with the MAC address on the physical port. Can you show some examples of the metrics you're trying to join? On Thursday 18 July 2024 at 18:48:35 UTC+1 Matthew Koch wrote: > I am working on a project to

[prometheus-users] Re: Regexp match in template

2024-06-28 Thread &#x27;Brian Candler&#x27; via Prometheus Users
See the template functions listed here: https://prometheus.io/docs/prometheus/latest/configuration/template_reference/#strings There is one called "match" which matches regular expressions. (Note that there is not one called "regexMatch") You should also be able to use the go template global fun

Re: [prometheus-users] Uptime SLA in percentage for metric

2024-06-24 Thread &#x27;Brian Candler&#x27; via Prometheus Users
A PromQL query like "mymetric == bool 2" will return 1 when the value is 2, and 0 otherwise. You'll likely need to run this inside a subquery if you're doing time range aggregation over it. But if Grafana is doing the summarization that might not be necessary. On Monday 24 June 2024 at 13:38:

[prometheus-users] Re: Prometheus LTS EOL

2024-06-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
The release cycle page has been updated: it shows 2.53 will be LTS supported from 2024-07-01 to 2025-07-31. (2.53.0 was actually released a few days ago). It would be nice to have a summary of key differences from 2.45 to 2.53 though. O

[prometheus-users] Re: node_exporter CPU underutilized alert

2024-06-23 Thread &#x27;Brian Candler&#x27; via Prometheus Users
node_cpu_seconds_total gives you a separate metric for each CPU, so with an 8 vCPU VM you'll get 8 alerts (if they're all under 20%) You're saying that you're happy with all these alerts, but want to suppress them where the VM has only one vCPU? In that case: count by (instance) (node_cpu_

[prometheus-users] Re: Alertmanager Configuration for Routing Alerts via Telegram

2024-06-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
If you put multiple matchers, they must all be true to match ("AND" semantics). So when you wrote - matchers: - alertname = "SystemdUnitDown" - alertname = "InstanceDown" it means alertname must be simultaneously equal to both those values, which can never be true. One

[prometheus-users] Re: ZTE DSL modem statistics

2024-06-20 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Talk to the vendor, or a user group for that device. Your starting point would be firstly to see if it supports SNMP, and if so to get hold of a copy of the MIB. If not, then you'd have to look at whether it exposes that information in any other way, and then write your own exporter to extract

[prometheus-users] Re: job label missing from discoveredLabels (prometheus v2.42.0)

2024-05-31 Thread &#x27;Brian Candler&#x27; via Prometheus Users
I don't see this with v2.45.5, and I'm also concerned about why "app": "another-testapp" occurs in one of your discoveredLabels. I suggest you try that, and/or the latests v2.52.1 (you can of course set up a completely separate instance but point it to the same service discovery source) and see

Re: [prometheus-users] how to get count of no.of instance

2024-05-28 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Those mangled screenshots are no use. What I would need to see are the actual results of the two queries, from the Prometheus web interface (not Grafana), in plain text: e.g. foo{bar="baz",qux="abc"} 42.0 ...with the *complete* set of labels, not expurgated. That's what's needed to formulate t

Re: [prometheus-users] Pod with Pending phase is in endpoints scraping targets (Prometheus 2.46.0)

2024-05-27 Thread &#x27;Brian Candler&#x27; via Prometheus Users
Have you looked in the changelog for Prometheus? I found: ## 2.51.0 / 2024-03-18 * [BUGFIX] Kubernetes SD: Pod status changes were not discovered by Endpoints service discovery #13337

Re: [prometheus-users] how to get count of no.of instance

2024-05-26 Thread &#x27;Brian Candler&#x27; via Prometheus Users
The labels for the two sides of the division need to match exactly. If they match 1:1 except for additional labels, then you can use xxx / on (foo,bar) yyy # foo,bar are the matching labels or xxx / ignoring (baz,qux) zzz # baz,qux are the labels to ignore If they match N:1 then you need to u

  1   2   3   >