On 2023-03-25 07:30, Kevin Z wrote:
Hi,

We have a server that has a high cardinality of metrics, mainly due to
a label that is tagged on the majority of the metrics. However, most
of our dashboards/queries don't use this label, and just use aggregate
queries. There are specific scenarios where we would need to debug and
sort based on the label, but this doesn't happen that often.

Is it a common design pattern to separate out two metrics endpoints,
one for aggregates, one for labelled metrics, with different scrape
intervals? This way we could limit the impact of the high cardinality
time series, by scraping the labelled metrics less frequently.

Couple of follow-up questions:
- When a query that uses the aggregate metric comes in, does it matter
that the data is potentially duplicated between the two endpoints? How
do we ensure that it doesn't try loading all the different time series
with the label and then aggregating, and instead directly use the
aggregate metric itself?
- How could we make sure this new setup is more efficient than the old
one? What criteria/metrics would be best (query evaluation time?
amount of data ingested?)


You certainly could split things into two endpoints and scrape at different intervals, however it is unlikely to make little/any difference. From the Prometheus side data points within a time series are very low impact. So for your aggregate endpoint you might be scraping every 30 seconds and the full data every 2 minutes (the slowest available scrape interval) meaning there are 4x less data points, which has very little memory impact.

You mention that there is a high cardinality - that is the thing which you need to fix, as that will be having the impact. You say there is a problematic label applied to most of the metrics. Can it be removed? What makes it problematic?

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/37bb8439f657caf18a5923d94b7db4f0%40Jahingo.com.

Reply via email to