That makes a lot of sense! Nautobot supports exposing prometheus metrics, 
and I modified my service discovery plugin to do just that.
I'm going to finetune this a bit more, and then that'll work. I feel like 
this was a logical conclusion for this, but I really needed the hints you 
guys gave. Thanks!

This will end up exposing 20k*3-5 (1 for every monitored interface on 20k 
devices), so let's say 75k extra metrics in the long run though. I assume 
Prometheus won't really feel any pain for that? I'm assuming for 
Prometheus, 75k isn't that much on a whole.

On Thursday, October 12, 2023 at 11:00:20 AM UTC+2 Brian Candler wrote:

> If I understand what you're doing, I wouldn't have 60000 static recording 
> rules, I would just create a text file like this:
>
> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa0"} 1
> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa1"} 1
> ... etc
>
> Then I would either stick this on a webserver and scrape it, or drop it 
> into a file for node_exporter's textfile collector to pick up. Either way 
> would need "honor_labels: true" to preserve the 'instance' label (or you 
> can put it in a different label, and then use metric relabelling to move 
> it).
>
> This also solves your problem about changes. There's no need to HUP 
> prometheus, it'll update on the next scrape.
>
> On Wednesday, 11 October 2023 at 14:22:04 UTC+1 Sebastiaan van Doesselaar 
> wrote:
>
>> Thank you very much for the pointers. I'd considered a recording rule 
>> might work when I did my Google adventures, but as you mention this works 
>> very well indeed.
>>
>> I had to slightly modify your query to:  (ifOperStatus != 1) * on 
>> (instance,ifName) monitored_interface_info
>>
>> Then it gives me the perfect result actually. Probably still needs some 
>> finetuning, but that's fine.
>>
>> To get back to your second suggestion: this unfortunately is not an 
>> option for us. We're not always in full control of what we monitor 
>> unfortunately. If we had been, that would be the easier and better solution 
>> indeed.
>>
>> Two questions left:
>>
>>    - Any recommended/supported way of loading the rules dynamically? I 
>>    saw you'd need a SIGHUP to reload them, so I could script it easily. 
>>    Preferably I'd use something (natively) supported though, like the 
>>    http_sd_config setup we use to do service discovery.
>>    - What'll the impact on performance be if we have say, (20000 
>>    instances, each with 2-5 monitored interfaces) 60000 recording rules like 
>>    this? I imagine it'll either be peanuts for Prometheus, or heavier than I 
>>    imagine.
>>
>>
>> On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote:
>>
>>> For alerting on monitored interfaces I might suggest a different 
>>> approach than trying to apply them at discovery time. The discovery phase 
>>> is able to apply labels to the whole target device easily, but it's not 
>>> really going to work well to annotate individual metrics.
>>>
>>> What I would suggest is you populate a series of recording rules that 
>>> define which interfaces should be alerted on. Then you can use a join at 
>>> alert query time. This is also how you can set different alerting 
>>> thresholds for things dynamically.
>>>
>>> For example if you have this rule:
>>>
>>> groups:
>>> - name: monitored interfaces
>>>   interval: 1m
>>>   rules:
>>>     - record: monitored_interface_info
>>>       expr: vector(1)
>>>       labels:
>>>         instance: ORDER12345678
>>>         ifDescr: Fa0
>>>     - record: monitored_interface_info
>>>       expr: vector(1)
>>>       labels:
>>>         instance: ORDER12345678
>>>         ifDescr: Fa1
>>>
>>> Then your alert would look like this:
>>>
>>> - name: alerts
>>>   rules:
>>>     - alert: InterfaceDown
>>>       expr: ifOperStatus == 0 * on (instance,ifDescr) 
>>> monitored_interface_info
>>>       for: 5m
>>>
>>> You can use nautbot database to generate the rules file.
>>>
>>> Another approach would be to populate the monitored interface 
>>> information in your devices. If you can tag the interface 
>>> descriptions/aliases with a structured format you can use 
>>> metric_relabel_configs to create a monitored_interface label
>>>
>>> So if your interface description is say Fa0;true, you can do something 
>>> like this:
>>> metric_relabel_configs:
>>> - source_labels: [ifDescr]
>>>   regex: '.+;(.+)'
>>>   target_label: monitored_interface
>>> - source_labels: [ifDescr]
>>>   regex: '(.+);.+'
>>>   target_label: ifDescr
>>>
>>> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via 
>>> Prometheus Users <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've got a PoC running using nautobot (source of truth, scrape with 
>>>> SD), prometheus and snmp-exporter. It's scaling very well, much better 
>>>> than 
>>>> we anticipated and we're very eager to put the finishing touches on the 
>>>> PoC. 
>>>>
>>>> I want to set a bunch of interfaces to either "monitored" or not. Based 
>>>> on that, I will generate a label. For example: 
>>>> "__meta_nautobot_monitored_interfaces": 
>>>> "Fa0,Fa1". 
>>>>
>>>> A full return from Nautobot might be something like:
>>>>
>>>> [ { "targets": [ "ORDER12345678" ], "labels": { 
>>>> "__meta_nautobot_status": "Active", "__meta_nautobot_model": "Device", 
>>>> "__meta_nautobot_name": "ORDER12345678", "__meta_nautobot_id": 
>>>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip": 
>>>> "xxx", "__meta_nautobot_primary_ip4": "xxx", 
>>>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", "__meta_nautobot_role": 
>>>> "CPE", "__meta_nautobot_role_slug": "cpe", "__meta_nautobot_device_type": 
>>>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006", 
>>>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } } ]
>>>>
>>>>
>>>> My question is two-fold:
>>>>   - I'd like to drop all unrelated interfaces. I don't know how to do 
>>>> that using relabeling with this comma separated string. I'm open to 
>>>> presenting the data in a different way, since I made the prometheus 
>>>> service 
>>>> discovery plugin myself (based on the netbox one), but I haven't thought 
>>>> of 
>>>> a better way.
>>>>
>>>>   - I only want to alert on the monitored interfaces. I mean, if we fix 
>>>> the above, this is a non-issue, but if that's not possible, I'd like to at 
>>>> least only alert on the monitored interfaces.
>>>>
>>>> This is going to be run on ~20.000 devices with all differing 
>>>> configuration, models, vendor, etc.  It needs to be very dynamic and all 
>>>> sourced from Nautobot, as well as be refreshed when changes are made 
>>>> there. 
>>>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd 
>>>> create a new module for every possible configuration, and that doesn't 
>>>> seem 
>>>> ideal.
>>>>
>>>> If anyone has any suggestions on how to accomplish this, I'd very much 
>>>> appreciate it. 
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Prometheus Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com.

Reply via email to