I also note that you're only sampling periodically whether something is in
state 0, 1 or 2: you don't really know what happened in between those
samples, so you're never going to get a truly accurate value for how long
it was in each state. For example, it could flip from 1 to 2 and back to 1
between scrapes.
If you want *really* accurate answers for how long your application has
been in state 0, 1 or 2 then you need to reinstrument it with metrics which
accumulate time in each state:
application_state_seconds_total{state="0"} xxx
application_state_seconds_total{state="1"} xxx
application_state_seconds_total{state="2"} xxx
Every time your application changes state, you add the number of seconds it
was in that state to the total. In case it stays in the same state for a
long time you also do this periodically, e.g. if the application has
remained in the same state for 1 second, then you add 1 second to the
metric and subtract 1 second from from the time you're accumulating.
This is basically how many of the disk I/O metrics
<https://brian-candler.medium.com/interpreting-prometheus-metrics-for-linux-disk-i-o-utilization-4db53dfedcfc>
work.
Then:
increase(application_status_seconds_total{status="1"}[24h])
will give you a *very* accurate estimate of how long the application was in
that state, even if it switches many times within the same second.
On Wednesday, 22 March 2023 at 08:06:23 UTC Brian Candler wrote:
> It's not easy to do exactly.
>
> To get a rough answer, you can do a subquery
> <https://prometheus.io/docs/prometheus/latest/querying/basics/#subquery>:
> (foo == 1)[24h:1m] will resample the timeseries at 1 minute intervals, then
> you can wrap that with count_over_time, giving:
> count_over_time((foo == 1)[24h:1m])
>
> But if you weren't scraping at exactly 1 minute intervals, the count may
> not be accurate. Also if there are missed samples, the value of foo at time
> T will look back for the previous value (up to 5 minutes by default), which
> means in that situation some samples may be double-counted (in effect,
> assuming the metric value remained constant over that time, when you don't
> actually know what value it had).
>
> The only way I know to get an exact answer is to send the range vector
> query "foo[24h]" to the *instant* query endpoint
> <https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries>,
> then filter and count the samples client-side. A range vector like that
> gives the raw values with their raw timestamps as stored in the TSDB.
>
> For this use case it would be nice if Prometheus were to allow certain
> operators to work directly on range vectors, so you could write
> foo[24h] == 1
> But that would add quite a lot of complexity into the semantics of the
> query language, which already has to consider argument combinations for
> (scalar, scalar), (scalar, instant vector) and (instant vector, instant
> vector).
>
> On Wednesday, 22 March 2023 at 05:19:01 UTC BHARATH KUMAR wrote:
>
>> Hello All,
>>
>> I have a Prometheus metric that will give output as 0 or 1 or 2. It can
>> be anything(0 or 1 or 2). Could you tell me the number of 1's that occurred
>> in the last 24 hours?
>>
>> I tried with count_over_time. but I am getting errors. I tried
>> sum_over_time, but it is not working for a few test cases.
>>
>> Any lead?
>>
>> I really appreciate any help you can provide.
>>
>> Thanks & regards,
>> Bharath Kumar
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/c5acfb65-8214-4b01-9ee8-ce7aa96e7c83n%40googlegroups.com.