I have a metric with a large cardinality so when I attempt to do something
like this
histogram_quantile(0.99, sum(rate(http_request_duration_bucket[15d])) by (le
, slo, job, namespace))
I get the error `*query processing would load too many samples into memory
in query execution*`
What is the appropriate way of dealing with something like this?
The label(route) which is causing the high cardinality is not needed for
this query, so I attempted to create a recording rule to remove all of the
unnecessary labels
{
expr: 'sum(http_request_duration_bucket) by (le, slo, job, namespace,
pod)',
record: 'http_request_duration_bucket:slo',
}
but the issue with this approach is that I will then be running a "rate" on
a "sum" when I try to do the quantile calculation and I have noticed that
my alerting rules are slower to react.
Ideally I would "rollup" my aggregations with recording rules, something
like
[{
expr: 'histogram_quantile(0.99,
sum(rate(http_request_duration_bucket[1m])) by (le, slo, job, namespace))',
record: 'http_request_duration:99p:1m'
},
{
expr: 'histogram_quantile(0.99, http_request_duration:99p:1m[1h])) by
(le, slo, job, namespace))',
record: 'http_request_duration:99p:1h'
},
{
expr: 'histogram_quantile(0.99, http_request_duration:99p:1h[1d])) by
(le, slo, job, namespace))',
record: 'http_request_duration:99p:1d'
}]
but this produces some funky results.
What is the proper way of calculating averages, quantiles or any other
aggregate over a large timeframe? What if I wanted to see the 99th over the
course of the year?
Any help is much appreciated.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/d8f9e221-e682-4741-96bf-b4bdb5c105e4o%40googlegroups.com.