Thank you very much for the pointers. I'd considered a recording rule might 
work when I did my Google adventures, but as you mention this works very 
well indeed.

I had to slightly modify your query to:  (ifOperStatus != 1) * on 
(instance,ifName) monitored_interface_info

Then it gives me the perfect result actually. Probably still needs some 
finetuning, but that's fine.

To get back to your second suggestion: this unfortunately is not an option 
for us. We're not always in full control of what we monitor unfortunately. 
If we had been, that would be the easier and better solution indeed.

Two questions left:

   - Any recommended/supported way of loading the rules dynamically? I saw 
   you'd need a SIGHUP to reload them, so I could script it easily. Preferably 
   I'd use something (natively) supported though, like the http_sd_config 
   setup we use to do service discovery.
   - What'll the impact on performance be if we have say, (20000 instances, 
   each with 2-5 monitored interfaces) 60000 recording rules like this? I 
   imagine it'll either be peanuts for Prometheus, or heavier than I imagine.


On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote:

> For alerting on monitored interfaces I might suggest a different approach 
> than trying to apply them at discovery time. The discovery phase is able to 
> apply labels to the whole target device easily, but it's not really going 
> to work well to annotate individual metrics.
>
> What I would suggest is you populate a series of recording rules that 
> define which interfaces should be alerted on. Then you can use a join at 
> alert query time. This is also how you can set different alerting 
> thresholds for things dynamically.
>
> For example if you have this rule:
>
> groups:
> - name: monitored interfaces
>   interval: 1m
>   rules:
>     - record: monitored_interface_info
>       expr: vector(1)
>       labels:
>         instance: ORDER12345678
>         ifDescr: Fa0
>     - record: monitored_interface_info
>       expr: vector(1)
>       labels:
>         instance: ORDER12345678
>         ifDescr: Fa1
>
> Then your alert would look like this:
>
> - name: alerts
>   rules:
>     - alert: InterfaceDown
>       expr: ifOperStatus == 0 * on (instance,ifDescr) 
> monitored_interface_info
>       for: 5m
>
> You can use nautbot database to generate the rules file.
>
> Another approach would be to populate the monitored interface information 
> in your devices. If you can tag the interface descriptions/aliases with a 
> structured format you can use metric_relabel_configs to create a 
> monitored_interface label
>
> So if your interface description is say Fa0;true, you can do something 
> like this:
> metric_relabel_configs:
> - source_labels: [ifDescr]
>   regex: '.+;(.+)'
>   target_label: monitored_interface
> - source_labels: [ifDescr]
>   regex: '(.+);.+'
>   target_label: ifDescr
>
> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via Prometheus 
> Users <[email protected]> wrote:
>
>> Hi all,
>>
>> I've got a PoC running using nautobot (source of truth, scrape with SD), 
>> prometheus and snmp-exporter. It's scaling very well, much better than we 
>> anticipated and we're very eager to put the finishing touches on the PoC. 
>>
>> I want to set a bunch of interfaces to either "monitored" or not. Based 
>> on that, I will generate a label. For example: 
>> "__meta_nautobot_monitored_interfaces": 
>> "Fa0,Fa1". 
>>
>> A full return from Nautobot might be something like:
>>
>> [ { "targets": [ "ORDER12345678" ], "labels": { "__meta_nautobot_status": 
>> "Active", "__meta_nautobot_model": "Device", "__meta_nautobot_name": 
>> "ORDER12345678", "__meta_nautobot_id": 
>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip": 
>> "xxx", "__meta_nautobot_primary_ip4": "xxx", 
>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", "__meta_nautobot_role": 
>> "CPE", "__meta_nautobot_role_slug": "cpe", "__meta_nautobot_device_type": 
>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006", 
>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } } ]
>>
>>
>> My question is two-fold:
>>   - I'd like to drop all unrelated interfaces. I don't know how to do 
>> that using relabeling with this comma separated string. I'm open to 
>> presenting the data in a different way, since I made the prometheus service 
>> discovery plugin myself (based on the netbox one), but I haven't thought of 
>> a better way.
>>
>>   - I only want to alert on the monitored interfaces. I mean, if we fix 
>> the above, this is a non-issue, but if that's not possible, I'd like to at 
>> least only alert on the monitored interfaces.
>>
>> This is going to be run on ~20.000 devices with all differing 
>> configuration, models, vendor, etc.  It needs to be very dynamic and all 
>> sourced from Nautobot, as well as be refreshed when changes are made there. 
>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd 
>> create a new module for every possible configuration, and that doesn't seem 
>> ideal.
>>
>> If anyone has any suggestions on how to accomplish this, I'd very much 
>> appreciate it. 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7b197303-fc65-44d3-8ec4-cec16ed4b072n%40googlegroups.com.

Reply via email to