I'll check out gzip encoding. I think it already does, but I'll verify.

Thanks again to you both!

On Thursday, October 12, 2023 at 4:18:41 PM UTC+2 Ben Kochie wrote:

> Yes, 75k is no problem.  Typically I tell application developers to keep 
> below 10k metrics per instance. But for inventory stuff like this it's fine 
> to have more. I like to keep individual scrapes below a few hundred 
> thousand.
>
> One pro tip here, make sure your exporter endpoints support gzip http 
> encoding to reduce the response size over the wire.
>
> A decent size Prometheus can handle a few tens of millions of series total.
>
> On Thu, Oct 12, 2023, 13:56 'Sebastiaan van Doesselaar' via Prometheus 
> Users <[email protected]> wrote:
>
>> That makes a lot of sense! Nautobot supports exposing prometheus metrics, 
>> and I modified my service discovery plugin to do just that.
>> I'm going to finetune this a bit more, and then that'll work. I feel like 
>> this was a logical conclusion for this, but I really needed the hints you 
>> guys gave. Thanks!
>>
>> This will end up exposing 20k*3-5 (1 for every monitored interface on 20k 
>> devices), so let's say 75k extra metrics in the long run though. I assume 
>> Prometheus won't really feel any pain for that? I'm assuming for 
>> Prometheus, 75k isn't that much on a whole.
>>
>> On Thursday, October 12, 2023 at 11:00:20 AM UTC+2 Brian Candler wrote:
>>
>>> If I understand what you're doing, I wouldn't have 60000 static 
>>> recording rules, I would just create a text file like this:
>>>
>>> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa0"} 1
>>> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa1"} 1
>>> ... etc
>>>
>>> Then I would either stick this on a webserver and scrape it, or drop it 
>>> into a file for node_exporter's textfile collector to pick up. Either way 
>>> would need "honor_labels: true" to preserve the 'instance' label (or you 
>>> can put it in a different label, and then use metric relabelling to move 
>>> it).
>>>
>>> This also solves your problem about changes. There's no need to HUP 
>>> prometheus, it'll update on the next scrape.
>>>
>>> On Wednesday, 11 October 2023 at 14:22:04 UTC+1 Sebastiaan van 
>>> Doesselaar wrote:
>>>
>>>> Thank you very much for the pointers. I'd considered a recording rule 
>>>> might work when I did my Google adventures, but as you mention this works 
>>>> very well indeed.
>>>>
>>>> I had to slightly modify your query to:  (ifOperStatus != 1) * on 
>>>> (instance,ifName) monitored_interface_info
>>>>
>>>> Then it gives me the perfect result actually. Probably still needs some 
>>>> finetuning, but that's fine.
>>>>
>>>> To get back to your second suggestion: this unfortunately is not an 
>>>> option for us. We're not always in full control of what we monitor 
>>>> unfortunately. If we had been, that would be the easier and better 
>>>> solution 
>>>> indeed.
>>>>
>>>> Two questions left:
>>>>
>>>>    - Any recommended/supported way of loading the rules dynamically? I 
>>>>    saw you'd need a SIGHUP to reload them, so I could script it easily. 
>>>>    Preferably I'd use something (natively) supported though, like the 
>>>>    http_sd_config setup we use to do service discovery.
>>>>    - What'll the impact on performance be if we have say, (20000 
>>>>    instances, each with 2-5 monitored interfaces) 60000 recording rules 
>>>> like 
>>>>    this? I imagine it'll either be peanuts for Prometheus, or heavier than 
>>>> I 
>>>>    imagine.
>>>>
>>>>
>>>> On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote:
>>>>
>>>>> For alerting on monitored interfaces I might suggest a different 
>>>>> approach than trying to apply them at discovery time. The discovery phase 
>>>>> is able to apply labels to the whole target device easily, but it's not 
>>>>> really going to work well to annotate individual metrics.
>>>>>
>>>>> What I would suggest is you populate a series of recording rules that 
>>>>> define which interfaces should be alerted on. Then you can use a join at 
>>>>> alert query time. This is also how you can set different alerting 
>>>>> thresholds for things dynamically.
>>>>>
>>>>> For example if you have this rule:
>>>>>
>>>>> groups:
>>>>> - name: monitored interfaces
>>>>>   interval: 1m
>>>>>   rules:
>>>>>     - record: monitored_interface_info
>>>>>       expr: vector(1)
>>>>>       labels:
>>>>>         instance: ORDER12345678
>>>>>         ifDescr: Fa0
>>>>>     - record: monitored_interface_info
>>>>>       expr: vector(1)
>>>>>       labels:
>>>>>         instance: ORDER12345678
>>>>>         ifDescr: Fa1
>>>>>
>>>>> Then your alert would look like this:
>>>>>
>>>>> - name: alerts
>>>>>   rules:
>>>>>     - alert: InterfaceDown
>>>>>       expr: ifOperStatus == 0 * on (instance,ifDescr) 
>>>>> monitored_interface_info
>>>>>       for: 5m
>>>>>
>>>>> You can use nautbot database to generate the rules file.
>>>>>
>>>>> Another approach would be to populate the monitored interface 
>>>>> information in your devices. If you can tag the interface 
>>>>> descriptions/aliases with a structured format you can use 
>>>>> metric_relabel_configs to create a monitored_interface label
>>>>>
>>>>> So if your interface description is say Fa0;true, you can do something 
>>>>> like this:
>>>>> metric_relabel_configs:
>>>>> - source_labels: [ifDescr]
>>>>>   regex: '.+;(.+)'
>>>>>   target_label: monitored_interface
>>>>> - source_labels: [ifDescr]
>>>>>   regex: '(.+);.+'
>>>>>   target_label: ifDescr
>>>>>
>>>>> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via 
>>>>> Prometheus Users <[email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I've got a PoC running using nautobot (source of truth, scrape with 
>>>>>> SD), prometheus and snmp-exporter. It's scaling very well, much better 
>>>>>> than 
>>>>>> we anticipated and we're very eager to put the finishing touches on the 
>>>>>> PoC. 
>>>>>>
>>>>>> I want to set a bunch of interfaces to either "monitored" or not. 
>>>>>> Based on that, I will generate a label. For example: 
>>>>>> "__meta_nautobot_monitored_interfaces": 
>>>>>> "Fa0,Fa1". 
>>>>>>
>>>>>> A full return from Nautobot might be something like:
>>>>>>
>>>>>> [ { "targets": [ "ORDER12345678" ], "labels": { 
>>>>>> "__meta_nautobot_status": "Active", "__meta_nautobot_model": "Device", 
>>>>>> "__meta_nautobot_name": "ORDER12345678", "__meta_nautobot_id": 
>>>>>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip": 
>>>>>> "xxx", "__meta_nautobot_primary_ip4": "xxx", 
>>>>>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", 
>>>>>> "__meta_nautobot_role": 
>>>>>> "CPE", "__meta_nautobot_role_slug": "cpe", 
>>>>>> "__meta_nautobot_device_type": 
>>>>>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006", 
>>>>>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } 
>>>>>> } ]
>>>>>>
>>>>>>
>>>>>> My question is two-fold:
>>>>>>   - I'd like to drop all unrelated interfaces. I don't know how to do 
>>>>>> that using relabeling with this comma separated string. I'm open to 
>>>>>> presenting the data in a different way, since I made the prometheus 
>>>>>> service 
>>>>>> discovery plugin myself (based on the netbox one), but I haven't thought 
>>>>>> of 
>>>>>> a better way.
>>>>>>
>>>>>>   - I only want to alert on the monitored interfaces. I mean, if we 
>>>>>> fix the above, this is a non-issue, but if that's not possible, I'd like 
>>>>>> to 
>>>>>> at least only alert on the monitored interfaces.
>>>>>>
>>>>>> This is going to be run on ~20.000 devices with all differing 
>>>>>> configuration, models, vendor, etc.  It needs to be very dynamic and all 
>>>>>> sourced from Nautobot, as well as be refreshed when changes are made 
>>>>>> there. 
>>>>>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd 
>>>>>> create a new module for every possible configuration, and that doesn't 
>>>>>> seem 
>>>>>> ideal.
>>>>>>
>>>>>> If anyone has any suggestions on how to accomplish this, I'd very 
>>>>>> much appreciate it. 
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Prometheus Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c0cb57d1-0394-46a1-ac7e-9a60fd19e9e7n%40googlegroups.com.

Reply via email to