[prometheus-users] Re: Promteheus HA different metrics

Анастасия Зель Tue, 05 Sep 2023 03:45:10 -0700

Actually its targets on different k8s nodes, but the fail 100% of the time 
on that prometheus where its down. 
I get list of all down pods targets and noticed that number of down pods 
its the same on both prometheus nodes - 306 down pods targets. But its 
different targets :D
Yes, they scrape same urls of pods.
вторник, 5 сентября 2023 г. в 10:32:15 UTC+4, Brian Candler:


> Note that setting the scrape timeout longer than the scrape interval won't 
> achieve anything.
>
> I'd suggest you investigate by looking at the history of the "up" metric: 
> this will go to zero on scrape failures.  Can you discern a pattern?  Is it 
> only on a certain type of target, or targets running on a particular k8s 
> node?  Is it intermittent across all targets, or some targets which fail 
> 100% of the time?
>
> If you compare the Targets page on both servers, are they scraping exactly 
> the same URLs?  (That is, check whether service discovery is giving 
> different results)
>
> On Tuesday, 5 September 2023 at 06:09:55 UTC+1 Анастасия Зель wrote:
>
>> yes, i see errors on targets page in web interface.
>> I tried to increase timeout to 5 minutes and it changes nothing. 
>> Its strange because prometheus 2 always get this error on similar pods. 
>> And prometheus 1 never get this errors on this pods. 
>> понедельник, 4 сентября 2023 г. в 19:00:32 UTC+4, Brian Candler: 
>>
>>> On Monday, 4 September 2023 at 15:49:25 UTC+1 Анастасия Зель wrote:
>>>
>>> Hello, we use HA prometheus with two servers.
>>>
>>> You mean, two Prometheus servers with the same config, both scraping the 
>>> same targets?
>>>
>>>  
>>>
>>> The problem is we get different metrics in dashboards from this two 
>>> servers.
>>>
>>> Small differences are to be expected.  That's because the two servers 
>>> won't be scraping the targets at the same points in time.  If you see more 
>>> significant differences, then please provide some examples.
>>>
>>>  
>>>
>>> And we also scrape metrics from k8s, and some pods are not scraping 
>>> because of error context deadline exceeded
>>>
>>> That basically means "scrape timed out".  The scrape hadn't completed 
>>> within the "scrape_timeout:" value that you've set.  You'll need to look at 
>>> your individual exporters and the failing scrape URLs: either the target is 
>>> not reachable at all (e.g. firewalling or network configuration issue), or 
>>> the target is taking too long to respond.
>>>  
>>>
>>> Its differents pods on each server. In prometheus logs we dont see any 
>>> of errors.
>>>
>>> Where *do* you see the "context deadline exceeded" errors then?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/805a2feb-d0ab-4f70-a308-2a2e8a58cee6n%40googlegroups.com.

[prometheus-users] Re: Promteheus HA different metrics

Reply via email to