Re: [prometheus-users] Correlation between snmp scrape time and massive rate output for ifHCInOctets

Alexander Wilke Sat, 16 Mar 2024 01:08:51 -0700

Check File Format example.

Time Out, retries, max-repetition.


I use Repetition 50 or 100 with Cisco, retries 0 and Time Out 1s or 500ms 
below Prometheus timeout

Ben Kochie schrieb am Samstag, 16. März 2024 um 06:31:17 UTC+1:

> This is very likely a problem with counter resets or some other kind of 
> duplicate data.
>
> The best way to figure this out is to perform the query, but without the 
> `rate()` function.
>
> This can be done via the Prometheus UI (harder to do in Grafana) in the 
> "Table" view.
>
> Here is an example demo query 
> <https://prometheus.demo.do.prometheus.io/graph?g0.expr=process_cpu_seconds_total%7Bjob%3D%22prometheus%22%7D%5B2m%5D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h>
>
> The results is a list of the raw samples that are needed to debug.
>
> On Fri, Mar 15, 2024 at 11:41 PM Nick Carlton <nick.ca...@gmail.com> 
> wrote:
>
>> Hello Everyone,
>>
>> I have just seen something weird in my environment where I saw interface 
>> bandwidth on a gigabit switch reach about 1tbps on some of the 
>> interfaces.....
>>
>> Here is the query im using:
>>
>> rate(ifHCInOctets{ifHCInOctetsIntfName=~".*.\\/.*.",instance="<device-name>"}[2m])
>>  
>> * 8
>>
>> Which ive never had a problem with. Here is an image of the graph showing 
>> the massive increase in bandwidth and then decrease back to normal:
>>
>> [image: Screenshot 2024-03-15 222353.png]
>>
>> When Ive done some more investigation into what could have happened, I 
>> can see that the 'snmp_scrape_duration_seconds' metric increases to around 
>> 20s at the time. So the cisco switch is talking 20 seconds to respond to 
>> the SNMP request.
>>
>> [image: Screenshot 2024-03-15 222244.png]
>>
>> Im a bit confused as to how this could cause the rate query to give 
>> completely false data? Could the delay in data have caused prometheus to 
>> think there was more bandwidth on the interface? The switch certainly 
>> cannot do the speeds the graph is claiming!
>>
>> Im on v0.25.0 on the SNMP exporter and its normally sat around 2s for the 
>> scrapes. Im not blaming the exporter for the high response times, thats 
>> probably the switch. Just wondering if in some way the high response time 
>> could cause the rate query to give incorrect data. The fact the graph went 
>> back to normal post the high reponse times makes me think it wasn't the 
>> switch giving duff data.
>>
>> Anyone seen this before and is there any way to mitigate? Happy to 
>> provide more info if required :)
>>
>> Thanks
>> Nick
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/6fd3dca6-2013-47ad-af8f-3344e79954a7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5fc62bc2-42e0-46e6-8382-134f71836b9dn%40googlegroups.com.

Re: [prometheus-users] Correlation between snmp scrape time and massive rate output for ifHCInOctets

Reply via email to