On Thu, Aug 17, 2023 at 4:42 AM Peter Nguyễn <[email protected]> wrote:

> Thanks for your replies.
>
> > There is nothing to handle, the instance/pod IP is required for
> uniqueness tracking. Different instances of the same pod need to be tracked
> individually. In addition, most Deployment pods are going to get new
> generated pod names every time anyway.
>
> Then if we have a deployment with a large number of active time series
> like 01 million, every upgrade or fallback of the deployment would cause a
> significant memory increase because of the number time series is doubled,
> 02 millions in this case and Prometheus would get OOM if we don't reserve a
> huge memory for that scenario.
>

2 million series is no big deal, should only take a few extra gigabytes of
memory. This is not a huge amount and well within Prometheus capability.

For reference, I have deployments that generate more than 10M series and
can use upwards of 200GiB of memory when we go through a number of deploys
quickly. After things settle down, the memory is released, but it does take
a number of hours.


> > Prometheus compacts memory every 2 hours, so old data is flushed out of
> memory.
>
> I have re-run the test with Prometheus's latest version, v.2.46.0,
> capturing Prometheus memory using container_memory_rss metric. To me, it
> looks like the memory is not dropped after cutting HEAD to persistent
> block.
>

> [image: prometheus_instance_ip_port_concern_latest.jpg]
>
> Do you think it is expected? If yes, could you please share with us why
> the Memory is not freed up for inactive time series that are no longer in
> the HEAD block?
>

It will. Prometheus is written in Go, which is a garbage collected
language. It will release RSS memory as it needs to. You can see what Go is
currently using with go_memstats_alloc_bytes.


> On Wednesday, August 16, 2023 at 6:15:35 PM UTC+7 Ben Kochie wrote:
>
>> FYI, container_memory_working_set_bytes is a misleading metric. It
>> includes page cache memory, which can be unallocated any time, but improves
>> performance of queries.
>>
>> If you want to know the real memory use, I would recommend using
>> container_memory_rss
>>
>> On Wed, Aug 16, 2023 at 9:31 AM Peter Nguyễn <[email protected]> wrote:
>>
>>> Hi Prometheus experts,
>>>
>>> I have a Prometheus Pod (v2.40.7) running on our Kubernetes (k8s)
>>> cluster for metric scraping from multiple k8s targets.
>>>
>>> Recently, I have observed that whenever I restart a target (a k8s Pod)
>>> or perform a Helm upgrade, the memory consumption of Prometheus keeps
>>> increasing. After investigating, I discovered that each time the pod gets
>>> restarted, new set of time series from that target is generated due to
>>> dynamic values of `instance` and `pod_name`.
>>>
>>> The instance label value we use is in the format <pod_IP>:port, and
>>> `pod_name` label value is the pod name. Consequently, whenever a Pod is
>>> restarted, it receives a new allocated IP address, and a new pod name (if
>>> not statefulset's Pod) resulting in new values for the instance & pod_name
>>> label.
>>>
>>> When comes to HEAD truncation, and the number of time series in the HEAD
>>> block goes back to the previous low value, Prometheus memory still does not
>>> go back to the point before the target restarted. Here is the graph:
>>>
>>> [image: prometheus_instance_ip_port_concern.jpg]
>>>
>>> I am writing to seek advice on the best practices for handling these
>>> label values, particularly for the instance. Do you have any advice on what
>>> value format should be for those labels so we ge rid of the memory
>>> increased every time pod gets restarted? Any time e.g. after retention
>>> triggered, the memory would go back to the previous point?
>>>
>>> Regards, Vu
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmru934%3D6B-JZkG6Zw2MuniDuR7FywqDRvjfORiK%2B%2BTuKQ%40mail.gmail.com.

Reply via email to