When you say "measured by Kubernetes", what metric specifically? There are several misleading metrics. What matters is `container_memory_rss` or `container_memory_working_set_bytes`. The `container_memmory_usage_bytes` is misleading because it includes page cache values.
On Tue, Jan 24, 2023 at 10:20 AM Victor H <[email protected]> wrote: > Hi, > > We are running multiple Prometheus instances in Kubernetes (deployed using > Prometheus Operator) and hope that someone can help us understanding why > the RAM usage in a few of our instances are unexpectedly high (we think > it's cardinality but not sure where to look) > > In Prometheus A, we have the following stat: > > Number of Series: 56486 > Number of Chunks: 56684 > Number of Label Pairs: 678 > > tsdb analyze has the following result: > > /bin $ ./promtool tsdb analyze /prometheus/ > Block ID: 01GQGMKZAF548DPE2DFZTF1TRW > Duration: 1h59m59.368s > Series: 56470 > Label names: 26 > Postings (unique label pairs): 678 > Postings entries (total label pairs): 338705 > > This instance uses roughly between 4Gb - 5Gb of RAM (measured by > Kubernetes). > > From our reading, each time series should use around 8kb of RAM so for 56k > series should be using a mere 500Mb. > > On a different Prometheus instance (let's call it Prometheus Central) we > have 1,1m series and it's using 9Gb - 10Gb which is roughly what is > expected. > > We're curious about this instance and we believe it's cardinality. We have > a lot more targets in Prometheus A. I also note that the Posting entries > (total label pairs) is 338k but I'm not sure where to look for this. > > The top entries from tsdb analyze is right at the bottom of this post. The > "most common label pairs" entries have alarmingly high count, I wonder if > this contributes the high "total label pairs" and consequently higher than > expected RAM usage. > > When calculating the expected RAM usage, is the "total label pairs" is the > number we need to use rather than the "total series" > > Thanks, > Victor > > > Label pairs most involved in churning: > 296 activity_type=none > 258 workflow_type=PodUpdateWorkflow > 163 __name__=temporal_request_latency_bucket > 104 workflow_type=GenerateSPVarsWorkflow > 95 operation=RespondActivityTaskCompleted > 89 __name__=temporal_activity_execution_latency_bucket > 89 __name__=temporal_activity_schedule_to_start_latency_bucket > 65 workflow_type=PodInitWorkflow > 53 operation=RespondWorkflowTaskCompleted > 49 __name__=temporal_workflow_endtoend_latency_bucket > 49 __name__=temporal_workflow_task_schedule_to_start_latency_bucket > 49 __name__=temporal_workflow_task_execution_latency_bucket > 49 __name__=temporal_workflow_task_replay_latency_bucket > 39 activity_type=UpdatePodConnectionsActivity > 38 le=+Inf > 38 le=0.02 > 38 le=0.1 > 38 le=0.001 > 38 activity_type=GenerateSPVarsActivity > 38 le=5 > > Label names most involved in churning: > 734 __name__ > 734 job > 724 instance > 577 activity_type > 577 workflow_type > 541 le > 177 operation > 95 datname > 53 datid > 31 mode > 29 namespace > 21 state > 12 quantile > 11 container > 11 service > 11 pod > 11 endpoint > 10 scrape_job > 4 alertname > 4 severity > > Most common label pairs: > 23012 activity_type=none > 20060 workflow_type=PodUpdateWorkflow > 12712 __name__=temporal_request_latency_bucket > 8092 workflow_type=GenerateSPVarsWorkflow > 7440 operation=RespondActivityTaskCompleted > 6944 __name__=temporal_activity_execution_latency_bucket > 6944 __name__=temporal_activity_schedule_to_start_latency_bucket > 5100 workflow_type=PodInitWorkflow > 4140 operation=RespondWorkflowTaskCompleted > 3864 __name__=temporal_workflow_task_replay_latency_bucket > 3864 __name__=temporal_workflow_endtoend_latency_bucket > 3864 __name__=temporal_workflow_task_schedule_to_start_latency_bucket > 3864 __name__=temporal_workflow_task_execution_latency_bucket > 3080 activity_type=UpdatePodConnectionsActivity > 3004 le=0.5 > 3004 le=0.01 > 3004 le=0.1 > 3004 le=1 > 3004 le=0.001 > 3004 le=0.002 > > Label names with highest cumulative label value length: > 8312 scrape_job > 4279 workflow_type > 3994 rule_group > 2614 __name__ > 2478 instance > 1564 job > 434 datname > 248 activity_type > 139 mode > 128 operation > 109 version > 97 pod > 88 state > 68 service > 45 le > 44 namespace > 43 slice > 31 container > 28 quantile > 18 alertname > > Highest cardinality labels: > 138 instance > 138 scrape_job > 84 __name__ > 75 workflow_type > 71 datname > 70 job > 19 rule_group > 14 le > 10 activity_type > 9 mode > 9 quantile > 6 state > 6 operation > 5 datid > 4 slice > 2 container > 2 pod > 2 alertname > 2 version > 2 service > > Highest cardinality metric names: > 12712 temporal_request_latency_bucket > 6944 temporal_activity_execution_latency_bucket > 6944 temporal_activity_schedule_to_start_latency_bucket > 3864 temporal_workflow_task_schedule_to_start_latency_bucket > 3864 temporal_workflow_task_replay_latency_bucket > 3864 temporal_workflow_task_execution_latency_bucket > 3864 temporal_workflow_endtoend_latency_bucket > 2448 pg_locks_count > 1632 pg_stat_activity_count > 908 temporal_request > 690 prometheus_target_sync_length_seconds > 496 temporal_activity_execution_latency_count > 350 go_gc_duration_seconds > 340 pg_stat_database_tup_inserted > 340 pg_stat_database_temp_bytes > 340 pg_stat_database_xact_commit > 340 pg_stat_database_xact_rollback > 340 pg_stat_database_tup_updated > 340 pg_stat_database_deadlocks > 340 pg_stat_database_tup_returned > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/59f74cb9-3135-4fc3-a7e7-9bec02a3143an%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmo1Ve7K6JL9YTjWvwjd1Lw5X5nV_GR5jhjg_jMsUWzJ%2Bw%40mail.gmail.com.

