I have a metric with a large cardinality so when I attempt to do something 
like this

histogram_quantile(0.99, sum(rate(http_request_duration_bucket[15d])) by (le
, slo, job, namespace))

I get the error `*query processing would load too many samples into memory 
in query execution*`

What is the appropriate way of dealing with something like this? 

The label(route) which is causing the high cardinality is not needed for 
this query, so I attempted to create a recording rule to remove all of the 
unnecessary labels

{
    expr: 'sum(http_request_duration_bucket) by (le, slo, job, namespace, 
pod)',
    record: 'http_request_duration_bucket:slo',
}


but the issue with this approach is that I will then be running a "rate" on 
a "sum" when I try to do the quantile calculation and I have noticed that 
my alerting rules are slower to react.


Ideally I would "rollup" my aggregations with recording rules, something 
like
[{
  expr: 'histogram_quantile(0.99, 
sum(rate(http_request_duration_bucket[1m])) by (le, slo, job, namespace))',
  record: 'http_request_duration:99p:1m'
},
{
  expr: 'histogram_quantile(0.99, http_request_duration:99p:1m[1h])) by 
(le, slo, job, namespace))',
  record: 'http_request_duration:99p:1h'
},
{
  expr: 'histogram_quantile(0.99, http_request_duration:99p:1h[1d])) by 
(le, slo, job, namespace))',
  record: 'http_request_duration:99p:1d'
}]


but this produces some funky results.


What is the proper way of calculating averages, quantiles or any other 
aggregate over a large timeframe? What if I wanted to see the 99th over the 
course of the year?

Any help is much appreciated. 


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d8f9e221-e682-4741-96bf-b4bdb5c105e4o%40googlegroups.com.

Reply via email to