Re: [prometheus-users] Seeking advice on label values for Prometheus Metrics in Kubernetes

Peter Nguyễn Tue, 22 Aug 2023 20:45:12 -0700

Thanks Ben. I have created a ticket for this here 
https://github.com/prometheus/prometheus/issues/12741
On Wednesday, August 23, 2023 at 3:05:15 AM UTC+7 Ben Kochie wrote:


> Interesting, good work investigating that. Would you mind posting this 
> information as a new issue. 
>
> https://github.com/prometheus/prometheus/issues
>
> It could also be related to this PR: 
> https://github.com/prometheus/prometheus/pull/12726
>
>
> On Tue, Aug 22, 2023 at 12:24 PM Peter Nguyễn <[email protected]> wrote:
>
>> Hi,
>>
>> I has tried to read the code to find out answer for my question in the 
>> previous email.
>>
>> When looking at 
>> https://github.com/prometheus/prometheus/blob/main/scrape/scrape.go#L1690, 
>> it seems that Promethes caches data for each time series during target 
>> scraping to deal with staleness.
>>
>> However, the cached data for already disappeared targets seems not be 
>> cleaned up from scrape loop cache. It keeps growing when targets get 
>> restarted. 
>>
>> I tried to add the following code:
>> [image: prometheus_instance_ip_port_concern_latest_patch.jpg]
>> . 
>> Repeat the test, and I can see significant memory reduce. The memory 
>> drops very earlier when Prometheus receives target update from k8s 
>> discovery.
>>
>> [image: 
>> prometheus_instance_ip_port_concern_latest_after_fixing_loop_cache.jpg]
>>
>> Could you please have a look and see if that is a memory leak in 
>> Prometheus?
>>
>>
>> On Friday, August 18, 2023 at 6:18:16 PM UTC+7 Peter Nguyễn wrote:
>>
>>> Thank Ben for your tips on tuning GOGC.
>>>
>>> In regards to the question why Prometheus memory does not go back to the 
>>> initial setup even after inactive time series have been swiped away from 
>>> TSDB when reaching retention time, do you have any comment?
>>> On Friday, August 18, 2023 at 3:54:07 PM UTC+7 Ben Kochie wrote:
>>>
>>>> And if you look, GC kicked in just after 15:20 to reduce the RSS from 
>>>> 10GiB to a little over 8GiB. In your 3rd example, you're running with 
>>>> about 
>>>> 3.5KiB of memory per head series. This is perfectly normal and within 
>>>> expected results.
>>>>
>>>> Again, this is all related to Go memory garbage collection. The Go VM 
>>>> does what it does.
>>>>
>>>> There are some tunables. For example, we found that in our larger 
>>>> environment that GOGC=50 is more appropriate for our workloads compared to 
>>>> the Go default of GOGC=100. This should reduce the RSS to around 1.5x the 
>>>> go_memstats_alloc_bytes.
>>>>
>>>> On Fri, Aug 18, 2023 at 10:29 AM Peter Nguyễn <[email protected]> wrote:
>>>>
>>>>> > 2 million series is no big deal, should only take a few extra 
>>>>> gigabytes of memory. This is not a huge amount and well within Prometheus 
>>>>> capability. 
>>>>>
>>>>> 1) I have performed another test with 1M active timeseries. The memory 
>>>>> usage of Prometheus with 1M is around 3Bil on my env. I then restarted 
>>>>> target at around 18:10; The number of time series in HEAD block now 
>>>>> jumped 
>>>>> up to 2M, and the RAM usage was around 5Bil, *66% increase* compared to 
>>>>> the 
>>>>> prior point.
>>>>>
>>>>> [image: prometheus_instance_ip_port_concern_latest_v3.jpg]
>>>>>
>>>>> Looking at `go_memstats_alloc_bytes`, the number of allocated bytes go 
>>>>> down at HEAD truncation but the Prometheus seems did not.
>>>>>
>>>>> 2) I then left the deployment running over night to see if the memory 
>>>>> would go back to the previous low point or not. Here is what I got:
>>>>>
>>>>> [image: prometheus_instance_ip_port_concern_latest_v5.jpg]
>>>>>
>>>>> a) It seems that the memory did not go back to its 3Bil. I set 
>>>>> retention time to 4h, inactive time series should be swiped out. I am 
>>>>> confused why the memory does not return to its low point. Do Prometheus 
>>>>> keep any info related to inactive time series in memory?
>>>>>
>>>>> b) When I performed target restart again at 09:38, the memory keeps 
>>>>> jumping up. Now, the current value is at 6.7Bil, almost 100% increase 
>>>>> compared to the previous value.
>>>>>
>>>>> 3) When I restarted the target one more time while HEAD block is not 
>>>>> truncated yet, the memory jumps up to 10Bil. This is a huge memory 
>>>>> increased to us comparing to the starting point.
>>>>>
>>>>> [image: prometheus_instance_ip_port_concern_latest_v6.jpg]
>>>>> On Thursday, August 17, 2023 at 10:34:52 AM UTC+7 Ben Kochie wrote:
>>>>>
>>>>>> On Thu, Aug 17, 2023 at 4:42 AM Peter Nguyễn <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for your replies.
>>>>>>>
>>>>>>> > There is nothing to handle, the instance/pod IP is required for 
>>>>>>> uniqueness tracking. Different instances of the same pod need to be 
>>>>>>> tracked 
>>>>>>> individually. In addition, most Deployment pods are going to get new 
>>>>>>> generated pod names every time anyway.
>>>>>>>
>>>>>>> Then if we have a deployment with a large number of active time 
>>>>>>> series like 01 million, every upgrade or fallback of the deployment 
>>>>>>> would 
>>>>>>> cause a significant memory increase because of the number time series 
>>>>>>> is 
>>>>>>> doubled, 02 millions in this case and Prometheus would get OOM if we 
>>>>>>> don't 
>>>>>>> reserve a huge memory for that scenario.
>>>>>>>
>>>>>>
>>>>>> 2 million series is no big deal, should only take a few extra 
>>>>>> gigabytes of memory. This is not a huge amount and well within 
>>>>>> Prometheus 
>>>>>> capability.
>>>>>>
>>>>>> For reference, I have deployments that generate more than 10M series 
>>>>>> and can use upwards of 200GiB of memory when we go through a number of 
>>>>>> deploys quickly. After things settle down, the memory is released, but 
>>>>>> it 
>>>>>> does take a number of hours.
>>>>>>
>>>>>>
>>>>>>> > Prometheus compacts memory every 2 hours, so old data is flushed 
>>>>>>> out of memory.
>>>>>>>
>>>>>>> I have re-run the test with Prometheus's latest version, v.2.46.0, 
>>>>>>> capturing Prometheus memory using container_memory_rss metric. To 
>>>>>>> me, it looks like the memory is not dropped after cutting HEAD to 
>>>>>>> persistent block. 
>>>>>>>
>>>>>>
>>>>>>> [image: prometheus_instance_ip_port_concern_latest.jpg]
>>>>>>>
>>>>>>> Do you think it is expected? If yes, could you please share with us 
>>>>>>> why the Memory is not freed up for inactive time series that are no 
>>>>>>> longer 
>>>>>>> in the HEAD block?
>>>>>>>
>>>>>>
>>>>>> It will. Prometheus is written in Go, which is a garbage collected 
>>>>>> language. It will release RSS memory as it needs to. You can see what Go 
>>>>>> is 
>>>>>> currently using with go_memstats_alloc_bytes.
>>>>>>  
>>>>>>
>>>>>>> On Wednesday, August 16, 2023 at 6:15:35 PM UTC+7 Ben Kochie wrote:
>>>>>>>
>>>>>>>> FYI, container_memory_working_set_bytes is a misleading metric. It 
>>>>>>>> includes page cache memory, which can be unallocated any time, but 
>>>>>>>> improves 
>>>>>>>> performance of queries.
>>>>>>>>
>>>>>>>> If you want to know the real memory use, I would recommend using 
>>>>>>>> container_memory_rss
>>>>>>>>
>>>>>>>> On Wed, Aug 16, 2023 at 9:31 AM Peter Nguyễn <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Prometheus experts,
>>>>>>>>>
>>>>>>>>> I have a Prometheus Pod (v2.40.7) running on our Kubernetes (k8s) 
>>>>>>>>> cluster for metric scraping from multiple k8s targets.
>>>>>>>>>
>>>>>>>>> Recently, I have observed that whenever I restart a target (a k8s 
>>>>>>>>> Pod) or perform a Helm upgrade, the memory consumption of Prometheus 
>>>>>>>>> keeps 
>>>>>>>>> increasing. After investigating, I discovered that each time the pod 
>>>>>>>>> gets 
>>>>>>>>> restarted, new set of time series from that target is generated due 
>>>>>>>>> to 
>>>>>>>>> dynamic values of `instance` and `pod_name`.
>>>>>>>>>
>>>>>>>>> The instance label value we use is in the format <pod_IP>:port, 
>>>>>>>>> and `pod_name` label value is the pod name. Consequently, whenever a 
>>>>>>>>> Pod is 
>>>>>>>>> restarted, it receives a new allocated IP address, and a new pod name 
>>>>>>>>> (if 
>>>>>>>>> not statefulset's Pod) resulting in new values for the instance & 
>>>>>>>>> pod_name 
>>>>>>>>> label. 
>>>>>>>>>
>>>>>>>>> When comes to HEAD truncation, and the number of time series in 
>>>>>>>>> the HEAD block goes back to the previous low value, Prometheus memory 
>>>>>>>>> still 
>>>>>>>>> does not go back to the point before the target restarted. Here is 
>>>>>>>>> the 
>>>>>>>>> graph:
>>>>>>>>>
>>>>>>>>> [image: prometheus_instance_ip_port_concern.jpg]
>>>>>>>>>
>>>>>>>>> I am writing to seek advice on the best practices for handling 
>>>>>>>>> these label values, particularly for the instance. Do you have any 
>>>>>>>>> advice 
>>>>>>>>> on what value format should be for those labels so we ge rid of the 
>>>>>>>>> memory 
>>>>>>>>> increased every time pod gets restarted? Any time e.g. after 
>>>>>>>>> retention 
>>>>>>>>> triggered, the memory would go back to the previous point? 
>>>>>>>>>
>>>>>>>>> Regards, Vu 
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "Prometheus Users" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/27961908-8362-42a7-b1ce-ab27dcece7b1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Prometheus Users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/771be8ec-e37a-490b-bcf2-01de2cea591en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Prometheus Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>>
>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/prometheus-users/9fcfe778-6add-482f-b160-1bc4903ffa6en%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/prometheus-users/9fcfe778-6add-482f-b160-1bc4903ffa6en%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/23060df4-ece8-486f-b047-f0a102a49bdfn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/23060df4-ece8-486f-b047-f0a102a49bdfn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f99fa092-8587-4a67-a70a-49a07cbe56ecn%40googlegroups.com.

Re: [prometheus-users] Seeking advice on label values for Prometheus Metrics in Kubernetes

Reply via email to