I'll check out gzip encoding. I think it already does, but I'll verify. Thanks again to you both!
On Thursday, October 12, 2023 at 4:18:41 PM UTC+2 Ben Kochie wrote: > Yes, 75k is no problem. Typically I tell application developers to keep > below 10k metrics per instance. But for inventory stuff like this it's fine > to have more. I like to keep individual scrapes below a few hundred > thousand. > > One pro tip here, make sure your exporter endpoints support gzip http > encoding to reduce the response size over the wire. > > A decent size Prometheus can handle a few tens of millions of series total. > > On Thu, Oct 12, 2023, 13:56 'Sebastiaan van Doesselaar' via Prometheus > Users <[email protected]> wrote: > >> That makes a lot of sense! Nautobot supports exposing prometheus metrics, >> and I modified my service discovery plugin to do just that. >> I'm going to finetune this a bit more, and then that'll work. I feel like >> this was a logical conclusion for this, but I really needed the hints you >> guys gave. Thanks! >> >> This will end up exposing 20k*3-5 (1 for every monitored interface on 20k >> devices), so let's say 75k extra metrics in the long run though. I assume >> Prometheus won't really feel any pain for that? I'm assuming for >> Prometheus, 75k isn't that much on a whole. >> >> On Thursday, October 12, 2023 at 11:00:20 AM UTC+2 Brian Candler wrote: >> >>> If I understand what you're doing, I wouldn't have 60000 static >>> recording rules, I would just create a text file like this: >>> >>> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa0"} 1 >>> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa1"} 1 >>> ... etc >>> >>> Then I would either stick this on a webserver and scrape it, or drop it >>> into a file for node_exporter's textfile collector to pick up. Either way >>> would need "honor_labels: true" to preserve the 'instance' label (or you >>> can put it in a different label, and then use metric relabelling to move >>> it). >>> >>> This also solves your problem about changes. There's no need to HUP >>> prometheus, it'll update on the next scrape. >>> >>> On Wednesday, 11 October 2023 at 14:22:04 UTC+1 Sebastiaan van >>> Doesselaar wrote: >>> >>>> Thank you very much for the pointers. I'd considered a recording rule >>>> might work when I did my Google adventures, but as you mention this works >>>> very well indeed. >>>> >>>> I had to slightly modify your query to: (ifOperStatus != 1) * on >>>> (instance,ifName) monitored_interface_info >>>> >>>> Then it gives me the perfect result actually. Probably still needs some >>>> finetuning, but that's fine. >>>> >>>> To get back to your second suggestion: this unfortunately is not an >>>> option for us. We're not always in full control of what we monitor >>>> unfortunately. If we had been, that would be the easier and better >>>> solution >>>> indeed. >>>> >>>> Two questions left: >>>> >>>> - Any recommended/supported way of loading the rules dynamically? I >>>> saw you'd need a SIGHUP to reload them, so I could script it easily. >>>> Preferably I'd use something (natively) supported though, like the >>>> http_sd_config setup we use to do service discovery. >>>> - What'll the impact on performance be if we have say, (20000 >>>> instances, each with 2-5 monitored interfaces) 60000 recording rules >>>> like >>>> this? I imagine it'll either be peanuts for Prometheus, or heavier than >>>> I >>>> imagine. >>>> >>>> >>>> On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote: >>>> >>>>> For alerting on monitored interfaces I might suggest a different >>>>> approach than trying to apply them at discovery time. The discovery phase >>>>> is able to apply labels to the whole target device easily, but it's not >>>>> really going to work well to annotate individual metrics. >>>>> >>>>> What I would suggest is you populate a series of recording rules that >>>>> define which interfaces should be alerted on. Then you can use a join at >>>>> alert query time. This is also how you can set different alerting >>>>> thresholds for things dynamically. >>>>> >>>>> For example if you have this rule: >>>>> >>>>> groups: >>>>> - name: monitored interfaces >>>>> interval: 1m >>>>> rules: >>>>> - record: monitored_interface_info >>>>> expr: vector(1) >>>>> labels: >>>>> instance: ORDER12345678 >>>>> ifDescr: Fa0 >>>>> - record: monitored_interface_info >>>>> expr: vector(1) >>>>> labels: >>>>> instance: ORDER12345678 >>>>> ifDescr: Fa1 >>>>> >>>>> Then your alert would look like this: >>>>> >>>>> - name: alerts >>>>> rules: >>>>> - alert: InterfaceDown >>>>> expr: ifOperStatus == 0 * on (instance,ifDescr) >>>>> monitored_interface_info >>>>> for: 5m >>>>> >>>>> You can use nautbot database to generate the rules file. >>>>> >>>>> Another approach would be to populate the monitored interface >>>>> information in your devices. If you can tag the interface >>>>> descriptions/aliases with a structured format you can use >>>>> metric_relabel_configs to create a monitored_interface label >>>>> >>>>> So if your interface description is say Fa0;true, you can do something >>>>> like this: >>>>> metric_relabel_configs: >>>>> - source_labels: [ifDescr] >>>>> regex: '.+;(.+)' >>>>> target_label: monitored_interface >>>>> - source_labels: [ifDescr] >>>>> regex: '(.+);.+' >>>>> target_label: ifDescr >>>>> >>>>> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via >>>>> Prometheus Users <[email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I've got a PoC running using nautobot (source of truth, scrape with >>>>>> SD), prometheus and snmp-exporter. It's scaling very well, much better >>>>>> than >>>>>> we anticipated and we're very eager to put the finishing touches on the >>>>>> PoC. >>>>>> >>>>>> I want to set a bunch of interfaces to either "monitored" or not. >>>>>> Based on that, I will generate a label. For example: >>>>>> "__meta_nautobot_monitored_interfaces": >>>>>> "Fa0,Fa1". >>>>>> >>>>>> A full return from Nautobot might be something like: >>>>>> >>>>>> [ { "targets": [ "ORDER12345678" ], "labels": { >>>>>> "__meta_nautobot_status": "Active", "__meta_nautobot_model": "Device", >>>>>> "__meta_nautobot_name": "ORDER12345678", "__meta_nautobot_id": >>>>>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip": >>>>>> "xxx", "__meta_nautobot_primary_ip4": "xxx", >>>>>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", >>>>>> "__meta_nautobot_role": >>>>>> "CPE", "__meta_nautobot_role_slug": "cpe", >>>>>> "__meta_nautobot_device_type": >>>>>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006", >>>>>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } >>>>>> } ] >>>>>> >>>>>> >>>>>> My question is two-fold: >>>>>> - I'd like to drop all unrelated interfaces. I don't know how to do >>>>>> that using relabeling with this comma separated string. I'm open to >>>>>> presenting the data in a different way, since I made the prometheus >>>>>> service >>>>>> discovery plugin myself (based on the netbox one), but I haven't thought >>>>>> of >>>>>> a better way. >>>>>> >>>>>> - I only want to alert on the monitored interfaces. I mean, if we >>>>>> fix the above, this is a non-issue, but if that's not possible, I'd like >>>>>> to >>>>>> at least only alert on the monitored interfaces. >>>>>> >>>>>> This is going to be run on ~20.000 devices with all differing >>>>>> configuration, models, vendor, etc. It needs to be very dynamic and all >>>>>> sourced from Nautobot, as well as be refreshed when changes are made >>>>>> there. >>>>>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd >>>>>> create a new module for every possible configuration, and that doesn't >>>>>> seem >>>>>> ideal. >>>>>> >>>>>> If anyone has any suggestions on how to accomplish this, I'd very >>>>>> much appreciate it. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Prometheus Users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c0cb57d1-0394-46a1-ac7e-9a60fd19e9e7n%40googlegroups.com.

