Thanks Ben. I have created a ticket for this here https://github.com/prometheus/prometheus/issues/12741 On Wednesday, August 23, 2023 at 3:05:15 AM UTC+7 Ben Kochie wrote:
> Interesting, good work investigating that. Would you mind posting this > information as a new issue. > > https://github.com/prometheus/prometheus/issues > > It could also be related to this PR: > https://github.com/prometheus/prometheus/pull/12726 > > > On Tue, Aug 22, 2023 at 12:24 PM Peter Nguyễn <[email protected]> wrote: > >> Hi, >> >> I has tried to read the code to find out answer for my question in the >> previous email. >> >> When looking at >> https://github.com/prometheus/prometheus/blob/main/scrape/scrape.go#L1690, >> it seems that Promethes caches data for each time series during target >> scraping to deal with staleness. >> >> However, the cached data for already disappeared targets seems not be >> cleaned up from scrape loop cache. It keeps growing when targets get >> restarted. >> >> I tried to add the following code: >> [image: prometheus_instance_ip_port_concern_latest_patch.jpg] >> . >> Repeat the test, and I can see significant memory reduce. The memory >> drops very earlier when Prometheus receives target update from k8s >> discovery. >> >> [image: >> prometheus_instance_ip_port_concern_latest_after_fixing_loop_cache.jpg] >> >> Could you please have a look and see if that is a memory leak in >> Prometheus? >> >> >> On Friday, August 18, 2023 at 6:18:16 PM UTC+7 Peter Nguyễn wrote: >> >>> Thank Ben for your tips on tuning GOGC. >>> >>> In regards to the question why Prometheus memory does not go back to the >>> initial setup even after inactive time series have been swiped away from >>> TSDB when reaching retention time, do you have any comment? >>> On Friday, August 18, 2023 at 3:54:07 PM UTC+7 Ben Kochie wrote: >>> >>>> And if you look, GC kicked in just after 15:20 to reduce the RSS from >>>> 10GiB to a little over 8GiB. In your 3rd example, you're running with >>>> about >>>> 3.5KiB of memory per head series. This is perfectly normal and within >>>> expected results. >>>> >>>> Again, this is all related to Go memory garbage collection. The Go VM >>>> does what it does. >>>> >>>> There are some tunables. For example, we found that in our larger >>>> environment that GOGC=50 is more appropriate for our workloads compared to >>>> the Go default of GOGC=100. This should reduce the RSS to around 1.5x the >>>> go_memstats_alloc_bytes. >>>> >>>> On Fri, Aug 18, 2023 at 10:29 AM Peter Nguyễn <[email protected]> wrote: >>>> >>>>> > 2 million series is no big deal, should only take a few extra >>>>> gigabytes of memory. This is not a huge amount and well within Prometheus >>>>> capability. >>>>> >>>>> 1) I have performed another test with 1M active timeseries. The memory >>>>> usage of Prometheus with 1M is around 3Bil on my env. I then restarted >>>>> target at around 18:10; The number of time series in HEAD block now >>>>> jumped >>>>> up to 2M, and the RAM usage was around 5Bil, *66% increase* compared to >>>>> the >>>>> prior point. >>>>> >>>>> [image: prometheus_instance_ip_port_concern_latest_v3.jpg] >>>>> >>>>> Looking at `go_memstats_alloc_bytes`, the number of allocated bytes go >>>>> down at HEAD truncation but the Prometheus seems did not. >>>>> >>>>> 2) I then left the deployment running over night to see if the memory >>>>> would go back to the previous low point or not. Here is what I got: >>>>> >>>>> [image: prometheus_instance_ip_port_concern_latest_v5.jpg] >>>>> >>>>> a) It seems that the memory did not go back to its 3Bil. I set >>>>> retention time to 4h, inactive time series should be swiped out. I am >>>>> confused why the memory does not return to its low point. Do Prometheus >>>>> keep any info related to inactive time series in memory? >>>>> >>>>> b) When I performed target restart again at 09:38, the memory keeps >>>>> jumping up. Now, the current value is at 6.7Bil, almost 100% increase >>>>> compared to the previous value. >>>>> >>>>> 3) When I restarted the target one more time while HEAD block is not >>>>> truncated yet, the memory jumps up to 10Bil. This is a huge memory >>>>> increased to us comparing to the starting point. >>>>> >>>>> [image: prometheus_instance_ip_port_concern_latest_v6.jpg] >>>>> On Thursday, August 17, 2023 at 10:34:52 AM UTC+7 Ben Kochie wrote: >>>>> >>>>>> On Thu, Aug 17, 2023 at 4:42 AM Peter Nguyễn <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks for your replies. >>>>>>> >>>>>>> > There is nothing to handle, the instance/pod IP is required for >>>>>>> uniqueness tracking. Different instances of the same pod need to be >>>>>>> tracked >>>>>>> individually. In addition, most Deployment pods are going to get new >>>>>>> generated pod names every time anyway. >>>>>>> >>>>>>> Then if we have a deployment with a large number of active time >>>>>>> series like 01 million, every upgrade or fallback of the deployment >>>>>>> would >>>>>>> cause a significant memory increase because of the number time series >>>>>>> is >>>>>>> doubled, 02 millions in this case and Prometheus would get OOM if we >>>>>>> don't >>>>>>> reserve a huge memory for that scenario. >>>>>>> >>>>>> >>>>>> 2 million series is no big deal, should only take a few extra >>>>>> gigabytes of memory. This is not a huge amount and well within >>>>>> Prometheus >>>>>> capability. >>>>>> >>>>>> For reference, I have deployments that generate more than 10M series >>>>>> and can use upwards of 200GiB of memory when we go through a number of >>>>>> deploys quickly. After things settle down, the memory is released, but >>>>>> it >>>>>> does take a number of hours. >>>>>> >>>>>> >>>>>>> > Prometheus compacts memory every 2 hours, so old data is flushed >>>>>>> out of memory. >>>>>>> >>>>>>> I have re-run the test with Prometheus's latest version, v.2.46.0, >>>>>>> capturing Prometheus memory using container_memory_rss metric. To >>>>>>> me, it looks like the memory is not dropped after cutting HEAD to >>>>>>> persistent block. >>>>>>> >>>>>> >>>>>>> [image: prometheus_instance_ip_port_concern_latest.jpg] >>>>>>> >>>>>>> Do you think it is expected? If yes, could you please share with us >>>>>>> why the Memory is not freed up for inactive time series that are no >>>>>>> longer >>>>>>> in the HEAD block? >>>>>>> >>>>>> >>>>>> It will. Prometheus is written in Go, which is a garbage collected >>>>>> language. It will release RSS memory as it needs to. You can see what Go >>>>>> is >>>>>> currently using with go_memstats_alloc_bytes. >>>>>> >>>>>> >>>>>>> On Wednesday, August 16, 2023 at 6:15:35 PM UTC+7 Ben Kochie wrote: >>>>>>> >>>>>>>> FYI, container_memory_working_set_bytes is a misleading metric. It >>>>>>>> includes page cache memory, which can be unallocated any time, but >>>>>>>> improves >>>>>>>> performance of queries. >>>>>>>> >>>>>>>> If you want to know the real memory use, I would recommend using >>>>>>>> container_memory_rss >>>>>>>> >>>>>>>> On Wed, Aug 16, 2023 at 9:31 AM Peter Nguyễn <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Prometheus experts, >>>>>>>>> >>>>>>>>> I have a Prometheus Pod (v2.40.7) running on our Kubernetes (k8s) >>>>>>>>> cluster for metric scraping from multiple k8s targets. >>>>>>>>> >>>>>>>>> Recently, I have observed that whenever I restart a target (a k8s >>>>>>>>> Pod) or perform a Helm upgrade, the memory consumption of Prometheus >>>>>>>>> keeps >>>>>>>>> increasing. After investigating, I discovered that each time the pod >>>>>>>>> gets >>>>>>>>> restarted, new set of time series from that target is generated due >>>>>>>>> to >>>>>>>>> dynamic values of `instance` and `pod_name`. >>>>>>>>> >>>>>>>>> The instance label value we use is in the format <pod_IP>:port, >>>>>>>>> and `pod_name` label value is the pod name. Consequently, whenever a >>>>>>>>> Pod is >>>>>>>>> restarted, it receives a new allocated IP address, and a new pod name >>>>>>>>> (if >>>>>>>>> not statefulset's Pod) resulting in new values for the instance & >>>>>>>>> pod_name >>>>>>>>> label. >>>>>>>>> >>>>>>>>> When comes to HEAD truncation, and the number of time series in >>>>>>>>> the HEAD block goes back to the previous low value, Prometheus memory >>>>>>>>> still >>>>>>>>> does not go back to the point before the target restarted. Here is >>>>>>>>> the >>>>>>>>> graph: >>>>>>>>> >>>>>>>>> [image: prometheus_instance_ip_port_concern.jpg] >>>>>>>>> >>>>>>>>> I am writing to seek advice on the best practices for handling >>>>>>>>> these label values, particularly for the instance. Do you have any >>>>>>>>> advice >>>>>>>>> on what value format should be for those labels so we ge rid of the >>>>>>>>> memory >>>>>>>>> increased every time pod gets restarted? Any time e.g. after >>>>>>>>> retention >>>>>>>>> triggered, the memory would go back to the previous point? >>>>>>>>> >>>>>>>>> Regards, Vu >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "Prometheus Users" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Prometheus Users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-users/9fcfe778-6add-482f-b160-1bc4903ffa6en%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/prometheus-users/9fcfe778-6add-482f-b160-1bc4903ffa6en%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/23060df4-ece8-486f-b047-f0a102a49bdfn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-users/23060df4-ece8-486f-b047-f0a102a49bdfn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f99fa092-8587-4a67-a70a-49a07cbe56ecn%40googlegroups.com.

