Note that setting the scrape timeout longer than the scrape interval won't achieve anything.
I'd suggest you investigate by looking at the history of the "up" metric: this will go to zero on scrape failures. Can you discern a pattern? Is it only on a certain type of target, or targets running on a particular k8s node? Is it intermittent across all targets, or some targets which fail 100% of the time? If you compare the Targets page on both servers, are they scraping exactly the same URLs? (That is, check whether service discovery is giving different results) On Tuesday, 5 September 2023 at 06:09:55 UTC+1 Анастасия Зель wrote: > yes, i see errors on targets page in web interface. > I tried to increase timeout to 5 minutes and it changes nothing. > Its strange because prometheus 2 always get this error on similar pods. > And prometheus 1 never get this errors on this pods. > понедельник, 4 сентября 2023 г. в 19:00:32 UTC+4, Brian Candler: > >> On Monday, 4 September 2023 at 15:49:25 UTC+1 Анастасия Зель wrote: >> >> Hello, we use HA prometheus with two servers. >> >> You mean, two Prometheus servers with the same config, both scraping the >> same targets? >> >> >> >> The problem is we get different metrics in dashboards from this two >> servers. >> >> Small differences are to be expected. That's because the two servers >> won't be scraping the targets at the same points in time. If you see more >> significant differences, then please provide some examples. >> >> >> >> And we also scrape metrics from k8s, and some pods are not scraping >> because of error context deadline exceeded >> >> That basically means "scrape timed out". The scrape hadn't completed >> within the "scrape_timeout:" value that you've set. You'll need to look at >> your individual exporters and the failing scrape URLs: either the target is >> not reachable at all (e.g. firewalling or network configuration issue), or >> the target is taking too long to respond. >> >> >> Its differents pods on each server. In prometheus logs we dont see any of >> errors. >> >> Where *do* you see the "context deadline exceeded" errors then? >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ff7ed768-c75b-462d-be60-7c2d47773751n%40googlegroups.com.

