yeah, i think scrape manually it will be useful but remember that its k8s pods :) i only have pod ip and i cant get it from prometheus node because they are in different subnets. Pods subnet don't have access to outside network. so i dont know how i can scrape manually particular pod target from prometheus server.
but thank you for yours guesses, i will check it out вторник, 5 сентября 2023 г. в 15:06:30 UTC+4, Brian Candler: > > the fail 100% of the time on that prometheus where its down > > Then you're lucky: in principle it's straightforward to debug. > - get a shell on the affected prometheus server > - use "curl" to do a manual scrape of the target which is down (using the > same URL that the Targets list shows) > - if it fails, then you've taken Prometheus out of the equation. > > My best guesses would be (1) Network connectivity between the Prometheus > server and the affected pods, or (2) service discovery is giving wrong > information (i.e. you're scraping the wrong URL in the first place) > > In case (2), I note that you're getting the targets to scrape from pod > annotations. Look carefully at the values of those annotations, and how > they are mapped into scrape address/port/path for the affected pods. > > On Tuesday, 5 September 2023 at 11:45:04 UTC+1 Анастасия Зель wrote: > >> Actually its targets on different k8s nodes, but the fail 100% of the >> time on that prometheus where its down. >> I get list of all down pods targets and noticed that number of down pods >> its the same on both prometheus nodes - 306 down pods targets. But its >> different targets :D >> Yes, they scrape same urls of pods. >> вторник, 5 сентября 2023 г. в 10:32:15 UTC+4, Brian Candler: >> >>> Note that setting the scrape timeout longer than the scrape interval >>> won't achieve anything. >>> >>> I'd suggest you investigate by looking at the history of the "up" >>> metric: this will go to zero on scrape failures. Can you discern a >>> pattern? Is it only on a certain type of target, or targets running on a >>> particular k8s node? Is it intermittent across all targets, or some >>> targets which fail 100% of the time? >>> >>> If you compare the Targets page on both servers, are they scraping >>> exactly the same URLs? (That is, check whether service discovery is giving >>> different results) >>> >>> On Tuesday, 5 September 2023 at 06:09:55 UTC+1 Анастасия Зель wrote: >>> >>>> yes, i see errors on targets page in web interface. >>>> I tried to increase timeout to 5 minutes and it changes nothing. >>>> Its strange because prometheus 2 always get this error on similar pods. >>>> And prometheus 1 never get this errors on this pods. >>>> понедельник, 4 сентября 2023 г. в 19:00:32 UTC+4, Brian Candler: >>>> >>>>> On Monday, 4 September 2023 at 15:49:25 UTC+1 Анастасия Зель wrote: >>>>> >>>>> Hello, we use HA prometheus with two servers. >>>>> >>>>> You mean, two Prometheus servers with the same config, both scraping >>>>> the same targets? >>>>> >>>>> >>>>> >>>>> The problem is we get different metrics in dashboards from this two >>>>> servers. >>>>> >>>>> Small differences are to be expected. That's because the two servers >>>>> won't be scraping the targets at the same points in time. If you see >>>>> more >>>>> significant differences, then please provide some examples. >>>>> >>>>> >>>>> >>>>> And we also scrape metrics from k8s, and some pods are not scraping >>>>> because of error context deadline exceeded >>>>> >>>>> That basically means "scrape timed out". The scrape hadn't completed >>>>> within the "scrape_timeout:" value that you've set. You'll need to look >>>>> at >>>>> your individual exporters and the failing scrape URLs: either the target >>>>> is >>>>> not reachable at all (e.g. firewalling or network configuration issue), >>>>> or >>>>> the target is taking too long to respond. >>>>> >>>>> >>>>> Its differents pods on each server. In prometheus logs we dont see any >>>>> of errors. >>>>> >>>>> Where *do* you see the "context deadline exceeded" errors then? >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/494ada91-c4b8-4ea5-bdbc-4db440c4a40en%40googlegroups.com.

