I upgraded Prometheus from 2.37.0 to 2.37.5 and I see negligible difference in memory consumption.
Constants: Number label pairs in prometheus-prometheus-my-namespace-0: 455 Number of Targets in prometheus-prometheus-my-namespace-0: 392 What do you suggest we do? # Analysis Number label pairs in prometheus-prometheus-my-namespace-0: 455 Number of Targets in prometheus-prometheus-my-namespace-0: 392 ## Version: v2.37.0 ## Version: v2.37.0 - Trough ```sh $ kubectl top pod prometheus-prometheus-my-namespace-0 NAME CPU(cores) MEMORY(bytes) prometheus-prometheus-my-namespace-0 31m 8748Mi ``` ## Version: v2.37.0 - Peak ```sh $ kubectl top pod prometheus-prometheus-my-namespace-0 NAME CPU(cores) MEMORY(bytes) prometheus-prometheus-my-namespace-0 31m 12160Mi ``` ## Version: v2.37.5 ### Version: v2.37.5 - Trough ```sh $ kubectl top pod prometheus-prometheus-my-namespace-0 NAME CPU(cores) MEMORY(bytes) prometheus-prometheus-my-namespace-0 31m 8338Mi ``` ## Version: v2.37.5 - Peak ```sh $ kubectl top pod prometheus-prometheus-my-namespace-0 NAME CPU(cores) MEMORY(bytes) prometheus-prometheus-my-namespace-0 241m 11698Mi ``` On Friday, 17 February 2023 at 14:28:59 UTC+13 Omero Saienni wrote: > I will upgrade to the LTS. > > I did upgrade to the latest helm chart and did see very little difference > but I will send you all some metrics and see how we can proceed. > > Thanks > > On Thursday, 2 February 2023 at 00:07:29 UTC+13 Brian Candler wrote: > >> That makes sense. Hopefully the LTS support for 2.37 can be extended in >> the mean time. >> >> On Wednesday, 1 February 2023 at 10:45:34 UTC Julien Pivotto wrote: >> >>> On 01 Feb 02:00, Brian Candler wrote: >>> > Aside: is 2.42.0 going to be an LTS version? >>> >>> Hello, >>> >>> I have not updated the website yet, but 2.42 will not be a LTS version. >>> >>> My feeling is that we still need a few releases so that the native >>> histogram and OOO ingestion "stabilizes". It is not about waiting for >>> them to be stable, but more making sure that the eventual bugs >>> introduced in the codebase by those two major features are noticed and >>> fixed. >>> >>> >>> > >>> > On Wednesday, 1 February 2023 at 09:35:00 UTC [email protected] wrote: >>> > >>> > > Or upgrade to 2.42.0. :) >>> > > >>> > > On Wed, Feb 1, 2023 at 9:48 AM Julien Pivotto < >>> [email protected]> >>> > > wrote: >>> > > >>> > >> On 24 Jan 21:43, Victor Hadianto wrote: >>> > >> > > Also, what version(s) of prometheus are these two instances? >>> > >> > >>> > >> > They are both the same: >>> > >> > prometheus, version 2.37.0 (branch: HEAD, revision: >>> > >> > b41e0750abf5cc18d8233161560731de05199330) >>> > >> >>> > >> Please update to 2.37.5. There has been a memory leak fixed in >>> 2.37.3. >>> > >> >>> > >> >>> > >> >>> > >> > >>> > >> > > The RAM usage of Prometheus depends on a number of factors. >>> There's a >>> > >> > calculator embedded in this article, but it's pretty old now: >>> > >> > >>> > >> >>> https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion >>> >>> > >> > >>> > >> > Thanks for this, I'll read & play around with that calculator for >>> our >>> > >> > Prometheus instances (we have 9 in various clusters now). >>> > >> > >>> > >> > Regards, >>> > >> > Victor >>> > >> > >>> > >> > >>> > >> > On Tue, 24 Jan 2023 at 21:03, Brian Candler <[email protected]> >>> wrote: >>> > >> > >>> > >> > > Also, what version(s) of prometheus are these two instances? >>> Different >>> > >> > > versions of Prometheus are compiled using different versions of >>> Go, >>> > >> which >>> > >> > > in turn have different degrees of aggressiveness in returning >>> unused >>> > >> RAM to >>> > >> > > the operating system. Also remember Go is a garbage-collected >>> > >> language. >>> > >> > > >>> > >> > > The RAM usage of Prometheus depends on a number of factors. >>> There's a >>> > >> > > calculator embedded in this article, but it's pretty old now: >>> > >> > > >>> > >> > > >>> > >> >>> https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion >>> >>> > >> > > >>> > >> > > On Tuesday, 24 January 2023 at 09:29:47 UTC [email protected] >>> wrote: >>> > >> > > >>> > >> > >> When you say "measured by Kubernetes", what metric >>> specifically? >>> > >> > >> >>> > >> > >> There are several misleading metrics. What matters is >>> > >> > >> `container_memory_rss` or >>> `container_memory_working_set_bytes`. The >>> > >> > >> `container_memmory_usage_bytes` is misleading because it >>> includes >>> > >> page >>> > >> > >> cache values. >>> > >> > >> >>> > >> > >> On Tue, Jan 24, 2023 at 10:20 AM Victor H <[email protected]> >>> wrote: >>> > >> > >> >>> > >> > >>> Hi, >>> > >> > >>> >>> > >> > >>> We are running multiple Prometheus instances in Kubernetes >>> (deployed >>> > >> > >>> using Prometheus Operator) and hope that someone can help us >>> > >> understanding >>> > >> > >>> why the RAM usage in a few of our instances are unexpectedly >>> high >>> > >> (we think >>> > >> > >>> it's cardinality but not sure where to look) >>> > >> > >>> >>> > >> > >>> In Prometheus A, we have the following stat: >>> > >> > >>> >>> > >> > >>> Number of Series: 56486 >>> > >> > >>> Number of Chunks: 56684 >>> > >> > >>> Number of Label Pairs: 678 >>> > >> > >>> >>> > >> > >>> tsdb analyze has the following result: >>> > >> > >>> >>> > >> > >>> /bin $ ./promtool tsdb analyze /prometheus/ >>> > >> > >>> Block ID: 01GQGMKZAF548DPE2DFZTF1TRW >>> > >> > >>> Duration: 1h59m59.368s >>> > >> > >>> Series: 56470 >>> > >> > >>> Label names: 26 >>> > >> > >>> Postings (unique label pairs): 678 >>> > >> > >>> Postings entries (total label pairs): 338705 >>> > >> > >>> >>> > >> > >>> This instance uses roughly between 4Gb - 5Gb of RAM (measured >>> by >>> > >> > >>> Kubernetes). >>> > >> > >>> >>> > >> > >>> From our reading, each time series should use around 8kb of >>> RAM so >>> > >> for >>> > >> > >>> 56k series should be using a mere 500Mb. >>> > >> > >>> >>> > >> > >>> On a different Prometheus instance (let's call it Prometheus >>> > >> Central) we >>> > >> > >>> have 1,1m series and it's using 9Gb - 10Gb which is roughly >>> what is >>> > >> > >>> expected. >>> > >> > >>> >>> > >> > >>> We're curious about this instance and we believe it's >>> cardinality. >>> > >> We >>> > >> > >>> have a lot more targets in Prometheus A. I also note that the >>> > >> Posting >>> > >> > >>> entries (total label pairs) is 338k but I'm not sure where to >>> look >>> > >> for this. >>> > >> > >>> >>> > >> > >>> The top entries from tsdb analyze is right at the bottom of >>> this >>> > >> post. >>> > >> > >>> The "most common label pairs" entries have alarmingly high >>> count, I >>> > >> wonder >>> > >> > >>> if this contributes the high "total label pairs" and >>> consequently >>> > >> higher >>> > >> > >>> than expected RAM usage. >>> > >> > >>> >>> > >> > >>> When calculating the expected RAM usage, is the "total label >>> pairs" >>> > >> is >>> > >> > >>> the number we need to use rather than the "total series" >>> > >> > >>> >>> > >> > >>> Thanks, >>> > >> > >>> Victor >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> Label pairs most involved in churning: >>> > >> > >>> 296 activity_type=none >>> > >> > >>> 258 workflow_type=PodUpdateWorkflow >>> > >> > >>> 163 __name__=temporal_request_latency_bucket >>> > >> > >>> 104 workflow_type=GenerateSPVarsWorkflow >>> > >> > >>> 95 operation=RespondActivityTaskCompleted >>> > >> > >>> 89 __name__=temporal_activity_execution_latency_bucket >>> > >> > >>> 89 >>> __name__=temporal_activity_schedule_to_start_latency_bucket >>> > >> > >>> 65 workflow_type=PodInitWorkflow >>> > >> > >>> 53 operation=RespondWorkflowTaskCompleted >>> > >> > >>> 49 __name__=temporal_workflow_endtoend_latency_bucket >>> > >> > >>> 49 >>> __name__=temporal_workflow_task_schedule_to_start_latency_bucket >>> > >> > >>> 49 __name__=temporal_workflow_task_execution_latency_bucket >>> > >> > >>> 49 __name__=temporal_workflow_task_replay_latency_bucket >>> > >> > >>> 39 activity_type=UpdatePodConnectionsActivity >>> > >> > >>> 38 le=+Inf >>> > >> > >>> 38 le=0.02 >>> > >> > >>> 38 le=0.1 >>> > >> > >>> 38 le=0.001 >>> > >> > >>> 38 activity_type=GenerateSPVarsActivity >>> > >> > >>> 38 le=5 >>> > >> > >>> >>> > >> > >>> Label names most involved in churning: >>> > >> > >>> 734 __name__ >>> > >> > >>> 734 job >>> > >> > >>> 724 instance >>> > >> > >>> 577 activity_type >>> > >> > >>> 577 workflow_type >>> > >> > >>> 541 le >>> > >> > >>> 177 operation >>> > >> > >>> 95 datname >>> > >> > >>> 53 datid >>> > >> > >>> 31 mode >>> > >> > >>> 29 namespace >>> > >> > >>> 21 state >>> > >> > >>> 12 quantile >>> > >> > >>> 11 container >>> > >> > >>> 11 service >>> > >> > >>> 11 pod >>> > >> > >>> 11 endpoint >>> > >> > >>> 10 scrape_job >>> > >> > >>> 4 alertname >>> > >> > >>> 4 severity >>> > >> > >>> >>> > >> > >>> Most common label pairs: >>> > >> > >>> 23012 activity_type=none >>> > >> > >>> 20060 workflow_type=PodUpdateWorkflow >>> > >> > >>> 12712 __name__=temporal_request_latency_bucket >>> > >> > >>> 8092 workflow_type=GenerateSPVarsWorkflow >>> > >> > >>> 7440 operation=RespondActivityTaskCompleted >>> > >> > >>> 6944 __name__=temporal_activity_execution_latency_bucket >>> > >> > >>> 6944 >>> __name__=temporal_activity_schedule_to_start_latency_bucket >>> > >> > >>> 5100 workflow_type=PodInitWorkflow >>> > >> > >>> 4140 operation=RespondWorkflowTaskCompleted >>> > >> > >>> 3864 __name__=temporal_workflow_task_replay_latency_bucket >>> > >> > >>> 3864 __name__=temporal_workflow_endtoend_latency_bucket >>> > >> > >>> 3864 >>> > >> __name__=temporal_workflow_task_schedule_to_start_latency_bucket >>> > >> > >>> 3864 __name__=temporal_workflow_task_execution_latency_bucket >>> > >> > >>> 3080 activity_type=UpdatePodConnectionsActivity >>> > >> > >>> 3004 le=0.5 >>> > >> > >>> 3004 le=0.01 >>> > >> > >>> 3004 le=0.1 >>> > >> > >>> 3004 le=1 >>> > >> > >>> 3004 le=0.001 >>> > >> > >>> 3004 le=0.002 >>> > >> > >>> >>> > >> > >>> Label names with highest cumulative label value length: >>> > >> > >>> 8312 scrape_job >>> > >> > >>> 4279 workflow_type >>> > >> > >>> 3994 rule_group >>> > >> > >>> 2614 __name__ >>> > >> > >>> 2478 instance >>> > >> > >>> 1564 job >>> > >> > >>> 434 datname >>> > >> > >>> 248 activity_type >>> > >> > >>> 139 mode >>> > >> > >>> 128 operation >>> > >> > >>> 109 version >>> > >> > >>> 97 pod >>> > >> > >>> 88 state >>> > >> > >>> 68 service >>> > >> > >>> 45 le >>> > >> > >>> 44 namespace >>> > >> > >>> 43 slice >>> > >> > >>> 31 container >>> > >> > >>> 28 quantile >>> > >> > >>> 18 alertname >>> > >> > >>> >>> > >> > >>> Highest cardinality labels: >>> > >> > >>> 138 instance >>> > >> > >>> 138 scrape_job >>> > >> > >>> 84 __name__ >>> > >> > >>> 75 workflow_type >>> > >> > >>> 71 datname >>> > >> > >>> 70 job >>> > >> > >>> 19 rule_group >>> > >> > >>> 14 le >>> > >> > >>> 10 activity_type >>> > >> > >>> 9 mode >>> > >> > >>> 9 quantile >>> > >> > >>> 6 state >>> > >> > >>> 6 operation >>> > >> > >>> 5 datid >>> > >> > >>> 4 slice >>> > >> > >>> 2 container >>> > >> > >>> 2 pod >>> > >> > >>> 2 alertname >>> > >> > >>> 2 version >>> > >> > >>> 2 service >>> > >> > >>> >>> > >> > >>> Highest cardinality metric names: >>> > >> > >>> 12712 temporal_request_latency_bucket >>> > >> > >>> 6944 temporal_activity_execution_latency_bucket >>> > >> > >>> 6944 temporal_activity_schedule_to_start_latency_bucket >>> > >> > >>> 3864 temporal_workflow_task_schedule_to_start_latency_bucket >>> > >> > >>> 3864 temporal_workflow_task_replay_latency_bucket >>> > >> > >>> 3864 temporal_workflow_task_execution_latency_bucket >>> > >> > >>> 3864 temporal_workflow_endtoend_latency_bucket >>> > >> > >>> 2448 pg_locks_count >>> > >> > >>> 1632 pg_stat_activity_count >>> > >> > >>> 908 temporal_request >>> > >> > >>> 690 prometheus_target_sync_length_seconds >>> > >> > >>> 496 temporal_activity_execution_latency_count >>> > >> > >>> 350 go_gc_duration_seconds >>> > >> > >>> 340 pg_stat_database_tup_inserted >>> > >> > >>> 340 pg_stat_database_temp_bytes >>> > >> > >>> 340 pg_stat_database_xact_commit >>> > >> > >>> 340 pg_stat_database_xact_rollback >>> > >> > >>> 340 pg_stat_database_tup_updated >>> > >> > >>> 340 pg_stat_database_deadlocks >>> > >> > >>> 340 pg_stat_database_tup_returned >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> -- >>> > >> > >>> You received this message because you are subscribed to the >>> Google >>> > >> > >>> Groups "Prometheus Users" group. >>> > >> > >>> To unsubscribe from this group and stop receiving emails from >>> it, >>> > >> send >>> > >> > >>> an email to [email protected]. >>> > >> > >>> To view this discussion on the web visit >>> > >> > >>> >>> > >> >>> https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com >>> >>> > >> > >>> < >>> > >> >>> https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com?utm_medium=email&utm_source=footer >>> >>> > >> > >>> > >> > >>> . >>> > >> > >>> >>> > >> > >> -- >>> > >> > > You received this message because you are subscribed to a topic >>> in the >>> > >> > > Google Groups "Prometheus Users" group. >>> > >> > > To unsubscribe from this topic, visit >>> > >> > > >>> > >> >>> https://groups.google.com/d/topic/prometheus-users/_yUpPWtFaQA/unsubscribe >>> > >> > > . >>> > >> > > To unsubscribe from this group and all its topics, send an >>> email to >>> > >> > > [email protected]. >>> > >> > > To view this discussion on the web visit >>> > >> > > >>> > >> >>> https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com >>> >>> > >> > > < >>> > >> >>> https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com?utm_medium=email&utm_source=footer >>> >>> > >> > >>> > >> > > . >>> > >> > > >>> > >> > >>> > >> > -- >>> > >> > You received this message because you are subscribed to the >>> Google >>> > >> Groups "Prometheus Users" group. >>> > >> > To unsubscribe from this group and stop receiving emails from it, >>> send >>> > >> an email to [email protected]. >>> > >> > To view this discussion on the web visit >>> > >> >>> https://groups.google.com/d/msgid/prometheus-users/CANP6zPKHQkSZPcQ%3Dcj1obbq4RfcnnE_eOJqEkYtvEwOqAE6EgQ%40mail.gmail.com >>> >>> > >> . >>> > >> >>> > >> -- >>> > >> Julien Pivotto >>> > >> @roidelapluie >>> > >> >>> > >> -- >>> > >> You received this message because you are subscribed to the Google >>> Groups >>> > >> "Prometheus Users" group. >>> > >> To unsubscribe from this group and stop receiving emails from it, >>> send an >>> > >> email to [email protected]. >>> > >> >>> > > To view this discussion on the web visit >>> > >> >>> https://groups.google.com/d/msgid/prometheus-users/Y9onaJkBb8Quugae%40nixos >>> > >> . >>> > >> >>> > > >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> > To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/b1a2bd98-b65f-40f0-b92b-52fe8f34febbn%40googlegroups.com. >>> >>> >>> >>> >>> -- >>> Julien Pivotto >>> @roidelapluie >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7a8bd1d6-993b-42b0-9c3c-ac8c175d1895n%40googlegroups.com.

