> Also, what version(s) of prometheus are these two instances? They are both the same: prometheus, version 2.37.0 (branch: HEAD, revision: b41e0750abf5cc18d8233161560731de05199330)
> The RAM usage of Prometheus depends on a number of factors. There's a calculator embedded in this article, but it's pretty old now: https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion Thanks for this, I'll read & play around with that calculator for our Prometheus instances (we have 9 in various clusters now). Regards, Victor On Tue, 24 Jan 2023 at 21:03, Brian Candler <[email protected]> wrote: > Also, what version(s) of prometheus are these two instances? Different > versions of Prometheus are compiled using different versions of Go, which > in turn have different degrees of aggressiveness in returning unused RAM to > the operating system. Also remember Go is a garbage-collected language. > > The RAM usage of Prometheus depends on a number of factors. There's a > calculator embedded in this article, but it's pretty old now: > > https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion > > On Tuesday, 24 January 2023 at 09:29:47 UTC [email protected] wrote: > >> When you say "measured by Kubernetes", what metric specifically? >> >> There are several misleading metrics. What matters is >> `container_memory_rss` or `container_memory_working_set_bytes`. The >> `container_memmory_usage_bytes` is misleading because it includes page >> cache values. >> >> On Tue, Jan 24, 2023 at 10:20 AM Victor H <[email protected]> wrote: >> >>> Hi, >>> >>> We are running multiple Prometheus instances in Kubernetes (deployed >>> using Prometheus Operator) and hope that someone can help us understanding >>> why the RAM usage in a few of our instances are unexpectedly high (we think >>> it's cardinality but not sure where to look) >>> >>> In Prometheus A, we have the following stat: >>> >>> Number of Series: 56486 >>> Number of Chunks: 56684 >>> Number of Label Pairs: 678 >>> >>> tsdb analyze has the following result: >>> >>> /bin $ ./promtool tsdb analyze /prometheus/ >>> Block ID: 01GQGMKZAF548DPE2DFZTF1TRW >>> Duration: 1h59m59.368s >>> Series: 56470 >>> Label names: 26 >>> Postings (unique label pairs): 678 >>> Postings entries (total label pairs): 338705 >>> >>> This instance uses roughly between 4Gb - 5Gb of RAM (measured by >>> Kubernetes). >>> >>> From our reading, each time series should use around 8kb of RAM so for >>> 56k series should be using a mere 500Mb. >>> >>> On a different Prometheus instance (let's call it Prometheus Central) we >>> have 1,1m series and it's using 9Gb - 10Gb which is roughly what is >>> expected. >>> >>> We're curious about this instance and we believe it's cardinality. We >>> have a lot more targets in Prometheus A. I also note that the Posting >>> entries (total label pairs) is 338k but I'm not sure where to look for this. >>> >>> The top entries from tsdb analyze is right at the bottom of this post. >>> The "most common label pairs" entries have alarmingly high count, I wonder >>> if this contributes the high "total label pairs" and consequently higher >>> than expected RAM usage. >>> >>> When calculating the expected RAM usage, is the "total label pairs" is >>> the number we need to use rather than the "total series" >>> >>> Thanks, >>> Victor >>> >>> >>> Label pairs most involved in churning: >>> 296 activity_type=none >>> 258 workflow_type=PodUpdateWorkflow >>> 163 __name__=temporal_request_latency_bucket >>> 104 workflow_type=GenerateSPVarsWorkflow >>> 95 operation=RespondActivityTaskCompleted >>> 89 __name__=temporal_activity_execution_latency_bucket >>> 89 __name__=temporal_activity_schedule_to_start_latency_bucket >>> 65 workflow_type=PodInitWorkflow >>> 53 operation=RespondWorkflowTaskCompleted >>> 49 __name__=temporal_workflow_endtoend_latency_bucket >>> 49 __name__=temporal_workflow_task_schedule_to_start_latency_bucket >>> 49 __name__=temporal_workflow_task_execution_latency_bucket >>> 49 __name__=temporal_workflow_task_replay_latency_bucket >>> 39 activity_type=UpdatePodConnectionsActivity >>> 38 le=+Inf >>> 38 le=0.02 >>> 38 le=0.1 >>> 38 le=0.001 >>> 38 activity_type=GenerateSPVarsActivity >>> 38 le=5 >>> >>> Label names most involved in churning: >>> 734 __name__ >>> 734 job >>> 724 instance >>> 577 activity_type >>> 577 workflow_type >>> 541 le >>> 177 operation >>> 95 datname >>> 53 datid >>> 31 mode >>> 29 namespace >>> 21 state >>> 12 quantile >>> 11 container >>> 11 service >>> 11 pod >>> 11 endpoint >>> 10 scrape_job >>> 4 alertname >>> 4 severity >>> >>> Most common label pairs: >>> 23012 activity_type=none >>> 20060 workflow_type=PodUpdateWorkflow >>> 12712 __name__=temporal_request_latency_bucket >>> 8092 workflow_type=GenerateSPVarsWorkflow >>> 7440 operation=RespondActivityTaskCompleted >>> 6944 __name__=temporal_activity_execution_latency_bucket >>> 6944 __name__=temporal_activity_schedule_to_start_latency_bucket >>> 5100 workflow_type=PodInitWorkflow >>> 4140 operation=RespondWorkflowTaskCompleted >>> 3864 __name__=temporal_workflow_task_replay_latency_bucket >>> 3864 __name__=temporal_workflow_endtoend_latency_bucket >>> 3864 __name__=temporal_workflow_task_schedule_to_start_latency_bucket >>> 3864 __name__=temporal_workflow_task_execution_latency_bucket >>> 3080 activity_type=UpdatePodConnectionsActivity >>> 3004 le=0.5 >>> 3004 le=0.01 >>> 3004 le=0.1 >>> 3004 le=1 >>> 3004 le=0.001 >>> 3004 le=0.002 >>> >>> Label names with highest cumulative label value length: >>> 8312 scrape_job >>> 4279 workflow_type >>> 3994 rule_group >>> 2614 __name__ >>> 2478 instance >>> 1564 job >>> 434 datname >>> 248 activity_type >>> 139 mode >>> 128 operation >>> 109 version >>> 97 pod >>> 88 state >>> 68 service >>> 45 le >>> 44 namespace >>> 43 slice >>> 31 container >>> 28 quantile >>> 18 alertname >>> >>> Highest cardinality labels: >>> 138 instance >>> 138 scrape_job >>> 84 __name__ >>> 75 workflow_type >>> 71 datname >>> 70 job >>> 19 rule_group >>> 14 le >>> 10 activity_type >>> 9 mode >>> 9 quantile >>> 6 state >>> 6 operation >>> 5 datid >>> 4 slice >>> 2 container >>> 2 pod >>> 2 alertname >>> 2 version >>> 2 service >>> >>> Highest cardinality metric names: >>> 12712 temporal_request_latency_bucket >>> 6944 temporal_activity_execution_latency_bucket >>> 6944 temporal_activity_schedule_to_start_latency_bucket >>> 3864 temporal_workflow_task_schedule_to_start_latency_bucket >>> 3864 temporal_workflow_task_replay_latency_bucket >>> 3864 temporal_workflow_task_execution_latency_bucket >>> 3864 temporal_workflow_endtoend_latency_bucket >>> 2448 pg_locks_count >>> 1632 pg_stat_activity_count >>> 908 temporal_request >>> 690 prometheus_target_sync_length_seconds >>> 496 temporal_activity_execution_latency_count >>> 350 go_gc_duration_seconds >>> 340 pg_stat_database_tup_inserted >>> 340 pg_stat_database_temp_bytes >>> 340 pg_stat_database_xact_commit >>> 340 pg_stat_database_xact_rollback >>> 340 pg_stat_database_tup_updated >>> 340 pg_stat_database_deadlocks >>> 340 pg_stat_database_tup_returned >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "Prometheus Users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/prometheus-users/_yUpPWtFaQA/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/9a2d7848-4f4f-43b9-90f4-765367f33c47n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CANP6zPKHQkSZPcQ%3Dcj1obbq4RfcnnE_eOJqEkYtvEwOqAE6EgQ%40mail.gmail.com.

