Re: [prometheus-users] Optimizing query with many duplicate calls

David Leibovic Thu, 21 Sep 2023 03:28:48 -0700

>
> Another way to improve this. I think what we could do is a query
> optimization that automatically detects identical expressions and only
> evaluates them once. This would make it transparent to the user without
> having to change the language.
>


I like this idea!

How wide is `$aqi_interval here?
>

It can vary between 10s and 24h. Most of the time I use 24h. The query gets
slower the larger the interval is.

I'm guessing this is one of a few drop-down options, since it's not one of
> the dynamic values like $__interval.
>

Correct.

If you pre-recorded just that, it would be much faster to query.
>

Good point - I had considered doing this actually. I had a slight aversion
to pre-recording because I would need to pre-record the averages for each
$aqi_interval I was interested in looking at. Or else pre-record the longer
intervals and fallback to unrecorded calculations for the shorter
intervals. But if there is currently no better solution, then I may do this.

Thanks!

On Wed, Sep 20, 2023 at 12:57 AM Ben Kochie <[email protected]> wrote:

> One more question:
>
> How wide is `$aqi_interval here? I'm guessing this is one of a few
> drop-down options, since it's not one of the dynamic values like
> $__interval.
>
> If this value is hours or days, it's likely one of the main sources of
> slowness, since it needs to load a lot of samples. If you pre-recorded just
> that, it would be much faster to query.
>
> - record: instance:ambient_pm25_env:avg1d
>   expr: avg_over_time(ambient_pm25_env[1d])
>
> Then your query would be
> instance:ambient_pm25_env:avg$aqi_interval{instance=~"$room.*"}. This
> would be an instant vector, and much faster. You could also do a relabel at
> the same time to extract the room label without the full hostname:port
> number. Which I'm guessing is why you're doing the regexp.
>
> On Wed, Sep 20, 2023 at 5:27 AM David Leibovic <[email protected]>
> wrote:
>
>> Btw, here's the new promql I ended up with, in case it's helpful to
>> anyone else:
>>
>> ((50 - 0) / (12 - 0) * ((avg_over_time(ambient_pm25_env{instance=~
>> "$room.*"}[$aqi_interval]) <= 12) - 0) + 0) or
>> ((100 - 51) / (35.4 - 12.1) * ((avg_over_time(ambient_pm25_env{instance=~
>> "$room.*"}[$aqi_interval]) <= 35.4) - 12.1) + 51) or
>> ((150 - 101) / (55.4 - 35.5) * ((avg_over_time(ambient_pm25_env{instance
>> =~"$room.*"}[$aqi_interval]) <= 55.4) - 35.5) + 101) or
>> ((200 - 151) / (150.4 - 55.5) * ((avg_over_time(ambient_pm25_env{instance
>> =~"$room.*"}[$aqi_interval]) <= 150.4) - 55.5) + 151) or
>> ((300 - 201) / (250.4 - 150.5) * ((avg_over_time(ambient_pm25_env{
>> instance=~"$room.*"}[$aqi_interval]) <= 250.4) - 150.5) + 201) or
>> ((400 - 301) / (350.4 - 250.5) * ((avg_over_time(ambient_pm25_env{
>> instance=~"$room.*"}[$aqi_interval]) <= 350.4) - 250.5) + 301) or
>> ((500 - 401) / (500.4 - 350.5) * ((avg_over_time(ambient_pm25_env{
>> instance=~"$room.*"}[$aqi_interval]) <= 500.4) - 350.5) + 401) or
>> clamp_max(avg_over_time(ambient_pm25_env{instance=~"$room.*"}[
>> $aqi_interval]), 600)
>>
>> On Tuesday, September 19, 2023 at 11:24:58 PM UTC-4 David Leibovic wrote:
>>
>>> Thanks very much to you both for the suggestions! Changing the regexp
>>> matching to exact string matching didn't help noticeably in my case,
>>> perhaps because I have less than 5 instances over which it has to do a
>>> regexp match. But removing the unnecessary lower bounds checks from my
>>> inequality checks reduced loading time by about 3/5.
>>>
>>> It's too bad that prometheus doesn't support something like variables
>>> that could be repeatedly referenced in promql - that would probably speed
>>> things up even more.
>>>
>>> On Monday, September 18, 2023 at 3:23:43 AM UTC-4 Brian Candler wrote:
>>>
>>>> One possibility is to use a recording rule for the expensive repeated
>>>> query.
>>>>
>>>> If I
>>>> rewrite avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval])
>>>> to X, I get:
>>>>
>>>> ((50 - 0) / (12 - 0) * ((X <= 12) - 0) + 0) or
>>>> ((100 - 51) / (35.4 - 12.1) * ((X > 12 and X <= 35.4) - 12.1) + 51) or
>>>> ((150 - 101) / (55.4 - 35.5) * ((X > 35.4 and X <= 55.4) - 35.5) + 101)
>>>> or
>>>> ((200 - 151) / (150.4 - 55.5) * ((X > 55.4 and X <= 150.4) - 55.5) +
>>>> 151) or
>>>> ((300 - 201) / (250.4 - 150.5) * ((X > 150.4 and X <= 250.4) - 150.5) +
>>>> 201) or
>>>> ((400 - 301) / (350.4 - 250.5) * ((X > 250.4 and X <= 350.4) - 250.5) +
>>>> 301) or
>>>> ((500 - 401) / (500.4 - 350.5) * ((X > 350.4 and X <= 500.4) - 350.5) +
>>>> 401) or
>>>> clamp_max(X, 600)
>>>>
>>>> I guess you're trying to apply different scaling for different ranges
>>>> of X:
>>>> - if X is between 0 and 12 (or negative) then rescale to 0 to 50
>>>> - if X is between 12 and 35.4 then rescale to 50(?) to 100
>>>> - if X is between 35.4 and 55.4 then rescale to 100(?) to 150
>>>> etc (except there seem to be some small discontinuities at the
>>>> boundaries, e.g. 12 versus 12.1, 50 versus 51)
>>>>
>>>> "A or B" will suppress elements in the B vector where the A vector has
>>>> a value (i.e. with a matching label set). That means it's unnecessary to
>>>> test the lower bounds, and I think your expression could simplify to
>>>> something like this:
>>>>
>>>> (X <= 12) * k1 + o1 or
>>>> (X <= 35.4) * k2 + o2 or
>>>> (X <= 55.4) * k3 + o3 or
>>>> (X <= 150.4) * k4  + o4 or
>>>> (X <= 250.4) * k5 + o5 or
>>>> (X <= 350.4) * k6 + o6 or
>>>> (X <= 500.4) * k7 + o7 or
>>>> clamp_max(X, 600)
>>>>
>>>> That would roughly halve the number of the subexpressions X.
>>>>
>>>> On Monday, 18 September 2023 at 07:01:50 UTC+1 Ben Kochie wrote:
>>>>
>>>>> One thing you can do to speed things up is to eliminate the `=~` in
>>>>> your query. Using regexp matching means it has to do a string search over
>>>>> every instance in your Prometheus for each metric. Using exact matching
>>>>> (`=`) will speed things up a lot. Although you won't be able to do 
>>>>> multiple
>>>>> matching if you want that in your dashboard variables.
>>>>>
>>>>> On Mon, Sep 18, 2023 at 7:58 AM David Leibovic <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi there, I'm trying to optimize a slow query of this form:
>>>>>>
>>>>>> (1 * avg_over_time(foo{instance=~"$i"}[$interval]) <= 10) or
>>>>>> (2 * avg_over_time(foo{instance=~"$i"}[$interval]) <= 20) or
>>>>>> (3 * avg_over_time(foo{instance=~"$i"}[$interval]) <= 30) or
>>>>>> (10 * avg_over_time(foo{instance=~"$i"}[$interval]))
>>>>>>
>>>>>> I suspect it's slow because of the many duplicate calls to
>>>>>> avg_over_time(foo{instance=~"$i"}[$interval])
>>>>>>
>>>>>> Is there some way to only call the avg_over_time function once and
>>>>>> re-use the results subsequently? I'm using Prometheus in conjunction with
>>>>>> Grafana, in case it's relevant.
>>>>>>
>>>>>> The full query I'm trying to optimize is much more complicated, but I
>>>>>> figured the above would be enough to understand the problem. But in case
>>>>>> it's helpful, here is the full query I am trying to optimize (it's an Air
>>>>>> Quality Index computation):
>>>>>>
>>>>>> ((50 - 0) / (12 - 0) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) <=
>>>>>> 12) - 0) + 0) or
>>>>>> ((100 - 51) / (35.4 - 12.1) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) > 
>>>>>> 12
>>>>>> and avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) 
>>>>>> <=
>>>>>> 35.4) - 12.1) + 51) or
>>>>>> ((150 - 101) / (55.4 - 35.5) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) >
>>>>>> 35.4 and
>>>>>> avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) <=
>>>>>> 55.4) - 35.5) + 101) or
>>>>>> ((200 - 151) / (150.4 - 55.5) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) >
>>>>>> 55.4 and
>>>>>> avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) <=
>>>>>> 150.4) - 55.5) + 151) or
>>>>>> ((300 - 201) / (250.4 - 150.5) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) >
>>>>>> 150.4 and
>>>>>> avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) <=
>>>>>> 250.4) - 150.5) + 201) or
>>>>>> ((400 - 301) / (350.4 - 250.5) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) >
>>>>>> 250.4 and
>>>>>> avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) <=
>>>>>> 350.4) - 250.5) + 301) or
>>>>>> ((500 - 401) / (500.4 - 350.5) *
>>>>>> ((avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) >
>>>>>> 350.4 and
>>>>>> avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]) <=
>>>>>> 500.4) - 350.5) + 401) or
>>>>>> clamp_max(avg_over_time(ambient_pm25_env{instance=~"$room.*"}[$aqi_interval]),
>>>>>> 600)
>>>>>>
>>>>>> Thanks for any help you can provide!
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Prometheus Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/prometheus-users/664689b3-9f45-4b05-9438-4225e2dce773n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/prometheus-users/664689b3-9f45-4b05-9438-4225e2dce773n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/46b7e4ee-fb4e-4f6c-9b65-5955bc93b7b2n%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/46b7e4ee-fb4e-4f6c-9b65-5955bc93b7b2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHnxub%2B1kV8x%2Bc%2BZhz0hPJAWFqTbVJDd%3D5vQicinphTaJAuE1w%40mail.gmail.com.

Re: [prometheus-users] Optimizing query with many duplicate calls

Reply via email to