Yes, 75k is no problem. Typically I tell application developers to keep below 10k metrics per instance. But for inventory stuff like this it's fine to have more. I like to keep individual scrapes below a few hundred thousand.
One pro tip here, make sure your exporter endpoints support gzip http encoding to reduce the response size over the wire. A decent size Prometheus can handle a few tens of millions of series total. On Thu, Oct 12, 2023, 13:56 'Sebastiaan van Doesselaar' via Prometheus Users <[email protected]> wrote: > That makes a lot of sense! Nautobot supports exposing prometheus metrics, > and I modified my service discovery plugin to do just that. > I'm going to finetune this a bit more, and then that'll work. I feel like > this was a logical conclusion for this, but I really needed the hints you > guys gave. Thanks! > > This will end up exposing 20k*3-5 (1 for every monitored interface on 20k > devices), so let's say 75k extra metrics in the long run though. I assume > Prometheus won't really feel any pain for that? I'm assuming for > Prometheus, 75k isn't that much on a whole. > > On Thursday, October 12, 2023 at 11:00:20 AM UTC+2 Brian Candler wrote: > >> If I understand what you're doing, I wouldn't have 60000 static recording >> rules, I would just create a text file like this: >> >> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa0"} 1 >> monitored_interface_info{instance="ORDER12345678",ifDescr="Fa1"} 1 >> ... etc >> >> Then I would either stick this on a webserver and scrape it, or drop it >> into a file for node_exporter's textfile collector to pick up. Either way >> would need "honor_labels: true" to preserve the 'instance' label (or you >> can put it in a different label, and then use metric relabelling to move >> it). >> >> This also solves your problem about changes. There's no need to HUP >> prometheus, it'll update on the next scrape. >> >> On Wednesday, 11 October 2023 at 14:22:04 UTC+1 Sebastiaan van Doesselaar >> wrote: >> >>> Thank you very much for the pointers. I'd considered a recording rule >>> might work when I did my Google adventures, but as you mention this works >>> very well indeed. >>> >>> I had to slightly modify your query to: (ifOperStatus != 1) * on >>> (instance,ifName) monitored_interface_info >>> >>> Then it gives me the perfect result actually. Probably still needs some >>> finetuning, but that's fine. >>> >>> To get back to your second suggestion: this unfortunately is not an >>> option for us. We're not always in full control of what we monitor >>> unfortunately. If we had been, that would be the easier and better solution >>> indeed. >>> >>> Two questions left: >>> >>> - Any recommended/supported way of loading the rules dynamically? I >>> saw you'd need a SIGHUP to reload them, so I could script it easily. >>> Preferably I'd use something (natively) supported though, like the >>> http_sd_config setup we use to do service discovery. >>> - What'll the impact on performance be if we have say, (20000 >>> instances, each with 2-5 monitored interfaces) 60000 recording rules like >>> this? I imagine it'll either be peanuts for Prometheus, or heavier than I >>> imagine. >>> >>> >>> On Wednesday, October 11, 2023 at 10:15:26 AM UTC+2 Ben Kochie wrote: >>> >>>> For alerting on monitored interfaces I might suggest a different >>>> approach than trying to apply them at discovery time. The discovery phase >>>> is able to apply labels to the whole target device easily, but it's not >>>> really going to work well to annotate individual metrics. >>>> >>>> What I would suggest is you populate a series of recording rules that >>>> define which interfaces should be alerted on. Then you can use a join at >>>> alert query time. This is also how you can set different alerting >>>> thresholds for things dynamically. >>>> >>>> For example if you have this rule: >>>> >>>> groups: >>>> - name: monitored interfaces >>>> interval: 1m >>>> rules: >>>> - record: monitored_interface_info >>>> expr: vector(1) >>>> labels: >>>> instance: ORDER12345678 >>>> ifDescr: Fa0 >>>> - record: monitored_interface_info >>>> expr: vector(1) >>>> labels: >>>> instance: ORDER12345678 >>>> ifDescr: Fa1 >>>> >>>> Then your alert would look like this: >>>> >>>> - name: alerts >>>> rules: >>>> - alert: InterfaceDown >>>> expr: ifOperStatus == 0 * on (instance,ifDescr) >>>> monitored_interface_info >>>> for: 5m >>>> >>>> You can use nautbot database to generate the rules file. >>>> >>>> Another approach would be to populate the monitored interface >>>> information in your devices. If you can tag the interface >>>> descriptions/aliases with a structured format you can use >>>> metric_relabel_configs to create a monitored_interface label >>>> >>>> So if your interface description is say Fa0;true, you can do something >>>> like this: >>>> metric_relabel_configs: >>>> - source_labels: [ifDescr] >>>> regex: '.+;(.+)' >>>> target_label: monitored_interface >>>> - source_labels: [ifDescr] >>>> regex: '(.+);.+' >>>> target_label: ifDescr >>>> >>>> On Wed, Oct 11, 2023 at 9:05 AM 'Sebastiaan van Doesselaar' via >>>> Prometheus Users <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I've got a PoC running using nautobot (source of truth, scrape with >>>>> SD), prometheus and snmp-exporter. It's scaling very well, much better >>>>> than >>>>> we anticipated and we're very eager to put the finishing touches on the >>>>> PoC. >>>>> >>>>> I want to set a bunch of interfaces to either "monitored" or not. >>>>> Based on that, I will generate a label. For example: >>>>> "__meta_nautobot_monitored_interfaces": >>>>> "Fa0,Fa1". >>>>> >>>>> A full return from Nautobot might be something like: >>>>> >>>>> [ { "targets": [ "ORDER12345678" ], "labels": { >>>>> "__meta_nautobot_status": "Active", "__meta_nautobot_model": "Device", >>>>> "__meta_nautobot_name": "ORDER12345678", "__meta_nautobot_id": >>>>> "c301aebf-e92d-4f72-8f2b-5768144c42f4", "__meta_nautobot_primary_ip": >>>>> "xxx", "__meta_nautobot_primary_ip4": "xxx", >>>>> "__meta_nautobot_monitored_interfaces": "Fa0,Fa1", "__meta_nautobot_role": >>>>> "CPE", "__meta_nautobot_role_slug": "cpe", "__meta_nautobot_device_type": >>>>> "ASR9006", "__meta_nautobot_device_type_slug": "asr9006", >>>>> "__meta_nautobot_site": "Place", "__meta_nautobot_site_slug": "place" } } >>>>> ] >>>>> >>>>> >>>>> My question is two-fold: >>>>> - I'd like to drop all unrelated interfaces. I don't know how to do >>>>> that using relabeling with this comma separated string. I'm open to >>>>> presenting the data in a different way, since I made the prometheus >>>>> service >>>>> discovery plugin myself (based on the netbox one), but I haven't thought >>>>> of >>>>> a better way. >>>>> >>>>> - I only want to alert on the monitored interfaces. I mean, if we >>>>> fix the above, this is a non-issue, but if that's not possible, I'd like >>>>> to >>>>> at least only alert on the monitored interfaces. >>>>> >>>>> This is going to be run on ~20.000 devices with all differing >>>>> configuration, models, vendor, etc. It needs to be very dynamic and all >>>>> sourced from Nautobot, as well as be refreshed when changes are made >>>>> there. >>>>> Using snmp.yml and a generator to do so doesn't seem feasible unless I'd >>>>> create a new module for every possible configuration, and that doesn't >>>>> seem >>>>> ideal. >>>>> >>>>> If anyone has any suggestions on how to accomplish this, I'd very much >>>>> appreciate it. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/prometheus-users/fc257af7-6611-4dcd-a0d0-cbf78c1b3a38n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/ba024c97-e51d-47ec-b88f-f407d9b0a55en%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmrsEDXdL9suObKtomondf4DbAgn4amLzHCNxjXHPXvMqg%40mail.gmail.com.

