[prometheus-users] Re: Prometheus High RAM Investigation

Brian Candler Thu, 10 Feb 2022 05:02:03 -0800

On Thursday, 10 February 2022 at 09:16:57 UTC [email protected] wrote:


> Number of Series  7644889
> Number of Chunks  8266039
> Number of Label Pairs  9968
> Like I mentioned above, We're getting* the average Metrics Per node as 
> 8257* and we have around 300 targets now, which makes our total metrics 
> around 2,100,000.
>

I don't know how you're determining "average Metrics per node".  But you 
can get total metrics at the current time instant via a direct query:

count({__name__=~".+"})

> *Are you monitoring Kubernetes pods by any chance?  *I'm not monitoring 
any pods, I connect to certain nodes that send in custom metrics.

Then maybe your metrics are at fault.

If there are 8 million series in your head chunk, i.e. in the last 2 hours, 
then you must have lots of series churn.  What defines a "timeseries" in 
Prometheus is the combination of metric name and the bag of labels.  If any 
label changes - even a single label - then that creates a whole new 
timeseries.  For example:

foo{bar="aaa",baz="bbb",qux="ccc"}
foo{bar="aaa",baz="bbb",qux="ccd"}

are two completely different timeseries.  The RAM usage is determined in 
large part by the number of different timeseries seen in the last 2 hours 
or so, which are in the "head chunk".  Therefore if you do something 
ill-advised, like putting a changing value in a label, you will get an 
explosion of timeseries.  Google "prometheus cardinality explosion": the 
top hit is this <https://www.robustperception.io/cardinality-is-key>.

Example 1:

http_requests_total{method="POST",path="/"} 1

This *might* be a reasonable metric, but only if the set of "path" values 
is limited (i.e. exporter selects from a pre-defined set, no random 
user-provided paths are ever shown)

Example 2:

http_requests_total{method="POST",source_ip="192.0.2.1",path="/"} 1

This one definitely isn't reasonable, because the source_ip address is a 
high cardinality value and you'll be creating a separate set of timeseries 
for every source address.  Prometheus will crash and burn.

If you need to store high-cardinality string values, then Prometheus is the 
wrong tool for the job.  Look at Loki or Elasticsearch.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a8a0c017-9b9b-4393-ad33-b1da4bfbfc27n%40googlegroups.com.

[prometheus-users] Re: Prometheus High RAM Investigation

Reply via email to