Actually its targets on different k8s nodes, but the fail 100% of the time on that prometheus where its down. I get list of all down pods targets and noticed that number of down pods its the same on both prometheus nodes - 306 down pods targets. But its different targets :D Yes, they scrape same urls of pods. вторник, 5 сентября 2023 г. в 10:32:15 UTC+4, Brian Candler:
> Note that setting the scrape timeout longer than the scrape interval won't > achieve anything. > > I'd suggest you investigate by looking at the history of the "up" metric: > this will go to zero on scrape failures. Can you discern a pattern? Is it > only on a certain type of target, or targets running on a particular k8s > node? Is it intermittent across all targets, or some targets which fail > 100% of the time? > > If you compare the Targets page on both servers, are they scraping exactly > the same URLs? (That is, check whether service discovery is giving > different results) > > On Tuesday, 5 September 2023 at 06:09:55 UTC+1 Анастасия Зель wrote: > >> yes, i see errors on targets page in web interface. >> I tried to increase timeout to 5 minutes and it changes nothing. >> Its strange because prometheus 2 always get this error on similar pods. >> And prometheus 1 never get this errors on this pods. >> понедельник, 4 сентября 2023 г. в 19:00:32 UTC+4, Brian Candler: >> >>> On Monday, 4 September 2023 at 15:49:25 UTC+1 Анастасия Зель wrote: >>> >>> Hello, we use HA prometheus with two servers. >>> >>> You mean, two Prometheus servers with the same config, both scraping the >>> same targets? >>> >>> >>> >>> The problem is we get different metrics in dashboards from this two >>> servers. >>> >>> Small differences are to be expected. That's because the two servers >>> won't be scraping the targets at the same points in time. If you see more >>> significant differences, then please provide some examples. >>> >>> >>> >>> And we also scrape metrics from k8s, and some pods are not scraping >>> because of error context deadline exceeded >>> >>> That basically means "scrape timed out". The scrape hadn't completed >>> within the "scrape_timeout:" value that you've set. You'll need to look at >>> your individual exporters and the failing scrape URLs: either the target is >>> not reachable at all (e.g. firewalling or network configuration issue), or >>> the target is taking too long to respond. >>> >>> >>> Its differents pods on each server. In prometheus logs we dont see any >>> of errors. >>> >>> Where *do* you see the "context deadline exceeded" errors then? >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/805a2feb-d0ab-4f70-a308-2a2e8a58cee6n%40googlegroups.com.

