On Thu, Aug 17, 2023 at 4:42 AM Peter Nguyễn <[email protected]> wrote:
> Thanks for your replies. > > > There is nothing to handle, the instance/pod IP is required for > uniqueness tracking. Different instances of the same pod need to be tracked > individually. In addition, most Deployment pods are going to get new > generated pod names every time anyway. > > Then if we have a deployment with a large number of active time series > like 01 million, every upgrade or fallback of the deployment would cause a > significant memory increase because of the number time series is doubled, > 02 millions in this case and Prometheus would get OOM if we don't reserve a > huge memory for that scenario. > 2 million series is no big deal, should only take a few extra gigabytes of memory. This is not a huge amount and well within Prometheus capability. For reference, I have deployments that generate more than 10M series and can use upwards of 200GiB of memory when we go through a number of deploys quickly. After things settle down, the memory is released, but it does take a number of hours. > > Prometheus compacts memory every 2 hours, so old data is flushed out of > memory. > > I have re-run the test with Prometheus's latest version, v.2.46.0, > capturing Prometheus memory using container_memory_rss metric. To me, it > looks like the memory is not dropped after cutting HEAD to persistent > block. > > [image: prometheus_instance_ip_port_concern_latest.jpg] > > Do you think it is expected? If yes, could you please share with us why > the Memory is not freed up for inactive time series that are no longer in > the HEAD block? > It will. Prometheus is written in Go, which is a garbage collected language. It will release RSS memory as it needs to. You can see what Go is currently using with go_memstats_alloc_bytes. > On Wednesday, August 16, 2023 at 6:15:35 PM UTC+7 Ben Kochie wrote: > >> FYI, container_memory_working_set_bytes is a misleading metric. It >> includes page cache memory, which can be unallocated any time, but improves >> performance of queries. >> >> If you want to know the real memory use, I would recommend using >> container_memory_rss >> >> On Wed, Aug 16, 2023 at 9:31 AM Peter Nguyễn <[email protected]> wrote: >> >>> Hi Prometheus experts, >>> >>> I have a Prometheus Pod (v2.40.7) running on our Kubernetes (k8s) >>> cluster for metric scraping from multiple k8s targets. >>> >>> Recently, I have observed that whenever I restart a target (a k8s Pod) >>> or perform a Helm upgrade, the memory consumption of Prometheus keeps >>> increasing. After investigating, I discovered that each time the pod gets >>> restarted, new set of time series from that target is generated due to >>> dynamic values of `instance` and `pod_name`. >>> >>> The instance label value we use is in the format <pod_IP>:port, and >>> `pod_name` label value is the pod name. Consequently, whenever a Pod is >>> restarted, it receives a new allocated IP address, and a new pod name (if >>> not statefulset's Pod) resulting in new values for the instance & pod_name >>> label. >>> >>> When comes to HEAD truncation, and the number of time series in the HEAD >>> block goes back to the previous low value, Prometheus memory still does not >>> go back to the point before the target restarted. Here is the graph: >>> >>> [image: prometheus_instance_ip_port_concern.jpg] >>> >>> I am writing to seek advice on the best practices for handling these >>> label values, particularly for the instance. Do you have any advice on what >>> value format should be for those labels so we ge rid of the memory >>> increased every time pod gets restarted? Any time e.g. after retention >>> triggered, the memory would go back to the previous point? >>> >>> Regards, Vu >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmru934%3D6B-JZkG6Zw2MuniDuR7FywqDRvjfORiK%2B%2BTuKQ%40mail.gmail.com.

