This is incredibly helpful, thanks for taking the time to write it. I don't
think there is anything like this level of description of how expr works in
the docs, but I may have missed it.

You also correctly anticipated that the missing-time-series scenario was an
issue for me in this work, so thanks for that too.

cam

On Fri, 13 Dec 2024 at 12:00, 'Brian Candler' via Prometheus Users <
prometheus-users@googlegroups.com> wrote:

> > I do not really understand how expr works in prom rules - is it
> something that simply evaluates to either 1 or 'true' as a go bool type?
>
> No. It's not boolean logic at all.
>
> PromQL works with *vectors*: a vector contains zero or more values, each
> with a distinct set of labels. An alert fires whenever the vector is
> non-empty, regardless of the value. That is, a value of 0 triggers an alert
> just as much as a value of 1000. It's the presence or absence of a value
> which controls alerting.
>
> Take, for example, the promql query "foo". It might return the following,
> all current values of metric foo:
>
> foo{instance="aaa"} 7
> foo{instance="bbb"} 3
> foo{instance="ccc"} 1
>
> That's a vector with three values.
>
> Now take the promql query "foo > 2". It returns a vector with 2 values:
>
> foo{instance="aaa"} 7
> foo{instance="bbb"} 3
>
> If you use "foo > 2" as an alerting expression, then you'll have two
> alerts firing.  If the value of foo{instance="bbb"} drops to 2 or less,
> then the alerting expression returns an instant vector with only one value,
> so the bbb alert resolves, but the aaa alert continues.
>
> This is the reason why "resolved" messages show the most recent value
> which triggered the alert, not the current (non-alerting) value. The
> current value is below the threshold, so is filtered out entirely from the
> PromQL results.
>
> Now, an expression like count({__name__=~"tcpsocket(.+)Inbound"}) also
> gives a vector as its result. If there are no timeseries inside the
> parentheses, then it is the empty vector. If there are one or more
> timeseries, then you get a single-element vector containing a single value
> (which is the count of timeseries) and an empty label set.  You can try
> this for yourself in the PromQL query browser:
>
> count({__name__=~"blah_nonexistent(.*)"})   #   empty result
> count({__name__=~"node_filesystem(.*)"})    #    {} 1234   where {} means
> "empty label set"
>
> Now, when you do a binary operation between two vector values, by default
> the result vector has one entry for every label set which matches exactly
> between the LHS and RHS vectors. Any label set on the LHS which is not
> matched on the RHS, or vice versa, is discarded and gives no value in the
> result vector.  But in this case, since the LHS and RHS will (almost)
> always have a single entry with empty label set, it will match.
>
> Therefore, what I think you want is simply:
>
> expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s !=
> count({__name__=~"tcpsocket(.+)Inbound"})
>
> That should do what you want *unless* __name__=~"tcpsocket(.+)Inbound"
> matches no timeseries at all, in which case the vector will be empty (on
> either the LHS or the RHS) and therefore the count() will be empty, and
> there's nothing to match to the other side.  If this is an important case
> for you then you can fake up a vector with empty labels:
>
> expr: count({__name__=~"tcpsocket(.+)Inbound"}) offset 30s != 
> count({__name__=~"tcpsocket(.+)Inbound"})
> or vector(0)
>
> Again, PromQL's "or" operator doesn't behave like boolean expression. What
> "or" does is to match the vectors on the LHS and the RHS:
> - for any value on the LHS, use the value and label set from the LHS in
> the result (whether or not it matches something in the RHS)
> - for any value on the RHS, whose label set does not exist in the LHS,
> then add it to the result.
>
> vector(0) is a static value: an instant vector containing one element
> whose label set is empty with value 0.  So if the previous expression
> doesn't contain an element with empty label set, "... or vector(0)" will
> add it to the result, and that will trigger the alert (with value 0).
>
> On Friday, 13 December 2024 at 09:39:02 UTC cam wrote:
>
>> This took about a week to appear on the list? Meantime, I have come up
>> with the following..
>>
>>   - alert: outboundSocketCountChange
>>     expr: *(*(count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
>> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0*) == 1*
>>
>>     labels:
>>       severity: critical
>>     annotations:
>>       summary: OB socket count has changed
>>
>> This does what I need but it makes me think I do not really understand
>> how expr works in prom rules - is it something that simply evaluates to
>> either 1 or 'true' as a go bool type?
>>
>> c
>>
>> On Friday, 13 December 2024 at 08:49:33 UTC cam wrote:
>>
>>> Hello all,
>>>
>>> I have a rule which is trying to count time series that match a certain
>>> regexp and spot when this changes, to raise an alert more or less
>>> immediately (i.e. no for clause). This is counting a custom socket count
>>> metric that we need to catch any changes in.
>>>
>>>   - alert: outboundSocketCountChange
>>>     expr: (count({__name__=~"tcpsocket(.+)Inbound"} offset 30s) -
>>> count({__name__=~"tcpsocket(.+)Inbound"})) != bool 0
>>>     labels:
>>>       severity: critical
>>>     annotations:
>>>       summary: OB socket count has changed
>>>
>>> It triggers fine when the value changes but it appears to then be stuck
>>> in firing, rather than resolving when the next evaluation window completes.
>>> Graphing the promQL shows exactly what I would expect - a single spike to 1
>>> when the value changes and then back to zero. I would expect the alert to
>>> clear when it hits that zero.
>>>
>>> Scrape and evaluation intervals are both set to 15s. Prom v2.45.
>>>
>>> Am I missing something here?
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/prometheus-users/AfVOhJ5rfOg/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/prometheus-users/77fed316-4283-4fc3-98d9-99bcf630e37bn%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/77fed316-4283-4fc3-98d9-99bcf630e37bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
............................................................
colm.mccar...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/CAD6vR%2BT2%2BAtB3%3DxdR08tXoKxqLE-i3Q_iWWznhScebT-%2BWahnQ%40mail.gmail.com.

Reply via email to