[prometheus-users] Re: Promteheus HA different metrics

Brian Candler Mon, 04 Sep 2023 23:32:20 -0700

Note that setting the scrape timeout longer than the scrape interval won't 
achieve anything.


I'd suggest you investigate by looking at the history of the "up" metric: 
this will go to zero on scrape failures.  Can you discern a pattern?  Is it 
only on a certain type of target, or targets running on a particular k8s 
node?  Is it intermittent across all targets, or some targets which fail 
100% of the time?

If you compare the Targets page on both servers, are they scraping exactly 
the same URLs?  (That is, check whether service discovery is giving 
different results)

On Tuesday, 5 September 2023 at 06:09:55 UTC+1 Анастасия Зель wrote:

> yes, i see errors on targets page in web interface.
> I tried to increase timeout to 5 minutes and it changes nothing. 
> Its strange because prometheus 2 always get this error on similar pods. 
> And prometheus 1 never get this errors on this pods. 
> понедельник, 4 сентября 2023 г. в 19:00:32 UTC+4, Brian Candler: 
>
>> On Monday, 4 September 2023 at 15:49:25 UTC+1 Анастасия Зель wrote:
>>
>> Hello, we use HA prometheus with two servers.
>>
>> You mean, two Prometheus servers with the same config, both scraping the 
>> same targets?
>>
>>  
>>
>> The problem is we get different metrics in dashboards from this two 
>> servers.
>>
>> Small differences are to be expected.  That's because the two servers 
>> won't be scraping the targets at the same points in time.  If you see more 
>> significant differences, then please provide some examples.
>>
>>  
>>
>> And we also scrape metrics from k8s, and some pods are not scraping 
>> because of error context deadline exceeded
>>
>> That basically means "scrape timed out".  The scrape hadn't completed 
>> within the "scrape_timeout:" value that you've set.  You'll need to look at 
>> your individual exporters and the failing scrape URLs: either the target is 
>> not reachable at all (e.g. firewalling or network configuration issue), or 
>> the target is taking too long to respond.
>>  
>>
>> Its differents pods on each server. In prometheus logs we dont see any of 
>> errors.
>>
>> Where *do* you see the "context deadline exceeded" errors then?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ff7ed768-c75b-462d-be60-7c2d47773751n%40googlegroups.com.

[prometheus-users] Re: Promteheus HA different metrics

Reply via email to