Re: [prometheus-users] How to filter monitored interfaces with snmp-exporter and Prometheus

Ben Kochie Thu, 12 Oct 2023 07:18:41 -0700

Yes, 75k is no problem.  Typically I tell application developers to keep
below 10k metrics per instance. But for inventory stuff like this it's fine
to have more. I like to keep individual scrapes below a few hundred
thousand.


One pro tip here, make sure your exporter endpoints support gzip http
encoding to reduce the response size over the wire.

A decent size Prometheus can handle a few tens of millions of series total.

On Thu, Oct 12, 2023, 13:56 'Sebastiaan van Doesselaar' via Prometheus
Users <[email protected]> wrote:

> That makes a lot of sense! Nautobot supports exposing prometheus metrics,
> and I modified my service discovery plugin to do just that.
> I'm going to finetune this a bit more, and then that'll work. I feel like
> this was a logical conclusion for this, but I really needed the hints you
> guys gave. Thanks!
>
> This will end up exposing 20k*3-5 (1 for every monitored interface on 20k
> devices), so let's say 75k extra metrics in the long run though. I assume
> Prometheus won't really feel any pain for that? I'm assuming for
> Prometheus, 75k isn't that much on a whole.
>
> On Thursday, October 12, 2023 at 11:00:20 AM UTC+2 Brian Candler wrote:
>
>> If I understand what you're doing, I wouldn't have 60000 static recording
>> rules, I would just create a text file like this:
>>
>> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa0"} 1
>> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa1"} 1
>> ... etc
>>
>> Then I would either stick this on a webserver and scrape it, or drop it
>> into a file for node_exporter's textfile collector to pick up. Either way
>> would need "honor_labels: true" to preserve the 'instance' label (or you
>> can put it in a different label, and then use metric relabelling to move
>> it).
>>
>> This also solves your problem about changes. There's no need to HUP
>> prometheus, it'll update on the next scrape.
>>
>> On Wednesday, 11 October 2023 at 14:22:04 UTC+1 Sebastiaan van Doesselaar
>> wrote:
>>
>>> Thank you very much for the pointers. I'd considered a recording rule
>>> might work when I did my Google adventures, but as you mention this works
>>> very well indeed.
>>>
>>> I had to slightly modify your query to:  (ifOperStatus != 1) * on
>>> (instance,ifName) monitored_interface_info
>>>
>>> Then it gives me the perfect result actually. Probably still needs some
>>> finetuning, but that's fine.
>>>
>>> To get back to your second suggestion: this unfortunately is not an
>>> option for us. We're not always in full control of what we monitor
>>> unfortunately. If we had been, that would be the easier and better solution
>>> indeed.
>>>
>>> Two questions left:
>>>
>>>    - Any recommended/supported way of loading the rules dynamically? I
>>>    saw you'd need a SIGHUP to reload them, so I could script it easily.
>>>    Preferably I'd use something (natively) supported though, like the
>>>    http_sd_config setup we use to do service discovery.
>>>    - What'll the impact on performance be if we have say, (20000
>>>    instances, each with 2-5 monitored interfaces) 60000 recording rules like
>>>    this? I imagine it'll either be peanuts for Prometheus, or heavier than I
>>>    imagine.
>>>
>>>
>>> On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote:
>>>
>>>> For alerting on monitored interfaces I might suggest a different
>>>> approach than trying to apply them at discovery time. The discovery phase
>>>> is able to apply labels to the whole target device easily, but it's not
>>>> really going to work well to annotate individual metrics.
>>>>
>>>> What I would suggest is you populate a series of recording rules that
>>>> define which interfaces should be alerted on. Then you can use a join at
>>>> alert query time. This is also how you can set different alerting
>>>> thresholds for things dynamically.
>>>>
>>>> For example if you have this rule:
>>>>
>>>> groups:
>>>> - name: monitored interfaces
>>>>   interval: 1m
>>>>   rules:
>>>>     - record: monitored_interface_info
>>>>       expr: vector(1)
>>>>       labels:
>>>>         instance: ORDER12345678
>>>>         ifDescr: Fa0
>>>>     - record: monitored_interface_info
>>>>       expr: vector(1)
>>>>       labels:
>>>>         instance: ORDER12345678
>>>>         ifDescr: Fa1
>>>>
>>>> Then your alert would look like this:
>>>>
>>>> - name: alerts
>>>>   rules:
>>>>     - alert: InterfaceDown
>>>>       expr: ifOperStatus == 0 * on (instance,ifDescr)
>>>> monitored_interface_info
>>>>       for: 5m
>>>>
>>>> You can use nautbot database to generate the rules file.
>>>>
>>>> Another approach would be to populate the monitored interface
>>>> information in your devices. If you can tag the interface
>>>> descriptions/aliases with a structured format you can use
>>>> metric_relabel_configs to create a monitored_interface label
>>>>
>>>> So if your interface description is say Fa0;true, you can do something
>>>> like this:
>>>> metric_relabel_configs:
>>>> - source_labels: [ifDescr]
>>>>   regex: '.+;(.+)'
>>>>   target_label: monitored_interface
>>>> - source_labels: [ifDescr]
>>>>   regex: '(.+);.+'
>>>>   target_label: ifDescr
>>>>
>>>> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via
>>>> Prometheus Users <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've got a PoC running using nautobot (source of truth, scrape with
>>>>> SD), prometheus and snmp-exporter. It's scaling very well, much better 
>>>>> than
>>>>> we anticipated and we're very eager to put the finishing touches on the
>>>>> PoC.
>>>>>
>>>>> I want to set a bunch of interfaces to either "monitored" or not.
>>>>> Based on that, I will generate a label. For example: 
>>>>> "__meta_nautobot_monitored_interfaces":
>>>>> "Fa0,Fa1".
>>>>>
>>>>> A full return from Nautobot might be something like:
>>>>>
>>>>> [ { "targets": [ "ORDER12345678" ], "labels": {
>>>>> "__meta_nautobot_status": "Active", "__meta_nautobot_model": "Device",
>>>>> "__meta_nautobot_name": "ORDER12345678", "__meta_nautobot_id":
>>>>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip":
>>>>> "xxx", "__meta_nautobot_primary_ip4": "xxx",
>>>>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", "__meta_nautobot_role":
>>>>> "CPE", "__meta_nautobot_role_slug": "cpe", "__meta_nautobot_device_type":
>>>>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006",
>>>>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } } 
>>>>> ]
>>>>>
>>>>>
>>>>> My question is two-fold:
>>>>>   - I'd like to drop all unrelated interfaces. I don't know how to do
>>>>> that using relabeling with this comma separated string. I'm open to
>>>>> presenting the data in a different way, since I made the prometheus 
>>>>> service
>>>>> discovery plugin myself (based on the netbox one), but I haven't thought 
>>>>> of
>>>>> a better way.
>>>>>
>>>>>   - I only want to alert on the monitored interfaces. I mean, if we
>>>>> fix the above, this is a non-issue, but if that's not possible, I'd like 
>>>>> to
>>>>> at least only alert on the monitored interfaces.
>>>>>
>>>>> This is going to be run on ~20.000 devices with all differing
>>>>> configuration, models, vendor, etc.  It needs to be very dynamic and all
>>>>> sourced from Nautobot, as well as be refreshed when changes are made 
>>>>> there.
>>>>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd
>>>>> create a new module for every possible configuration, and that doesn't 
>>>>> seem
>>>>> ideal.
>>>>>
>>>>> If anyone has any suggestions on how to accomplish this, I'd very much
>>>>> appreciate it.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Prometheus Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmrsEDXdL9suObKtomondf4DbAgn4amLzHCNxjXHPXvMqg%40mail.gmail.com.

Re: [prometheus-users] How to filter monitored interfaces with snmp-exporter and Prometheus

Reply via email to