On Sunday 21 July 2024 at 00:51:48 UTC+1 Christoph Anton Mitterer wrote:

Hey. 

On Sat, 2024-07-20 at 10:26 -0700, 'Brian Candler' via Prometheus Users 
wrote: 
> 
> If the label stays constant, then the amount of extra space required 
> is tiny.  There is an internal mapping between a bag of labels and a 
> timeseries ID. 

Is it the same if one uses a metric (like for the RPMs from below) and 
that never changes? I mean is that also efficient?


Yes:

smartraid_physical_drive_rotational_speed_rpm 7200
smartraid_info{rpm="7200"} 1

are both static timeseries. Prometheus does delta compression; if you store 
the same value repeatedly the difference between adjacent points is zero. 
It doesn't matter if the timeseries value is 1 or 7200.

 


> But if any label changes, that generates a completely new timeseries. 
> This is not something you want to happen too often (a.k.a "timeseries 
> churn"), but moderate amounts are OK. 

Why exactly wouldn't one want this? I mean especially with respect to 
such _info metrics.


It's just a general consideration. When a timeseries churns you get new a 
new index entry, new head blocks etc.

For info metrics which rarely change, it's fine.

The limiting worst case is where you have a label value that changes every 
sample (for example, putting a timestamp in a label). Then every scrape 
generates a new timeseries containing one point. Have a few hundred 
thousand scrapes like that and your server will collapse.

 


Graphing _info time series doesn't make sense anyway... so it's not as 
if one would get some usable time series/graph (like a temperature or 
so) interrupted, if e.g. the state changes for a while from OK to 
degraded.


Indeed, and Grafana has a swim-lanes type view that works quite well for 
that.  When a time series disappears, it goes "stale". But the good news 
is, for quite some time now, Prometheus has been automatically inserting 
staleness markers for a timeseries which existed in a previous scrape but 
not in the current scrape from the same job and target.

Prior to that, timeseries would only go stale if there had been no data 
point ingested for 5 minutes, so it would be very unclear when the 
timeseries had actually vanished.
 


I guess with appearing/disappearing you mean, that one has to take into 
account, that e.g. pd_info{state=="OK",pd_name="foo"} won't exist while 
"foo" is failed... and thus e.g. when graphing the OK-times of a 
device, it would per default show nothing during that time and not a 
value of zero?


Yes. And it's a bit harder to alert on that condition, but you just have to 
approach it the right way. As you've realised, you can alert on the 
presence of a timeseries with a label not "OK", which is easier than 
alerting on the absence of a timeseries whose label is "OK".

 

> The other option, if the state values are integer enumerations at 
> source (e.g. as from SNMP), is to store the raw numeric value: 
> 
> foo 3 
> 
> That means the querier has to know how the meaning of these values. 
> (Grafana can map specific values to textual labels and/or colours 
> though). 

But that also requires me to use a label like in enum_metric{value=3},


No, I mean

my_metric{other_labels="dontcare"} 3

An example is ifOperStatus in SNMP, where the meaning of values 1, 2, 3 
...etc is defined in the MIB.

 


or I have to construct metric names dynamically (which I could also 
have done for the symbolic name), which seems however discouraged (and 
I'd say for good reasons)?


Don't generate metric names dynamically. That's what labels are for.  (In 
any case, the metric name is itself just a hidden label called "__name__")
 
There is good advice at https://prometheus.io/docs/practices/naming/

I mean if both, label and metric, are equally efficient (in therms of 
storage)... then using a metric would have still the advantage of being 
able to do things like: 
smartraid_logical_drive_chunk_size_bytes > (256*1024) 
i.e. select those LDs, that use a chunk size > 256 KiB ... which I 
cannot (as easily) do if it's in a label.


Correct. The flip side is if you want to see at a glance all the 
information about a logical volume, you'll need to look at a bunch of 
different metrics and associate them by some common label (e.g. a unique 
volume ID)

Both approaches are valid.  If you see a use case for the filtering or 
arithmetic, that pushes you down the path of separate metrics.

If you're comparing a hundred static metrics versus a single metric with a 
hundred labels then I'd *guess* the single metric would be a bit more 
efficient in terms of storage and ingestion performance, but it's marginal 
and shouldn't really be a consideration: data is there to be used, so put 
it in whatever form allows you to make best use of it.

You can look at other exporters for guidance. For example, node_exporter 
has node_md_* metrics for MD arrays.  It provides a combined metric:

node_md_info{ActiveDevices="10", ChunkSize="512K", ConsistencyPolicy="none"
, CreationTime="Fri Feb 19 12:20:25 2021", FailedDevices="0", Layout=
"-unknown-", Name="dar6:127 (local to host dar6)", Persistence="Superblock 
is persistent", RaidDevices="10", RaidLevel="raid0", SpareDevices="0", State
="clean ", TotalDevices="10", UUID="6c1f02c0:4ade9cee:17936d5f:1990e5db", 
Version="1.2", WorkingDevices="10", md_device="md127", md_metadata_version=
"1.2", md_name="dar6:127", md_num_raid_disks="10", raid_level="0"} 1
 
But there are also separate metrics for:

node_md_info_ActiveDevices
node_md_info_ArraySize
node_md_info_Events
node_md_info_FailedDevices
node_md_info_RaidDevices
node_md_info_SpareDevices
node_md_info_TotalDevices
node_md_info_WorkingDevices



Should I have made e.g. only one 
smartraid_temperature{type="bla"} 
value (or perhaps a bit more than just "type", 
with "bla" being e.g. controller, capacitor, cache_module or "sensor". 

I.e. putting all temperatures in one metric, rather than the 4 
different ones I have now (and where I have no "type" label).


Personally I'd make all "temperature readings" be one metric, with labels 
to distinguish them.  It's more useful for graphing and aggregation.  
Having lots of different metrics is just harder to work with, in particular 
when it comes to making dashboards for them.  It's easy to go the other way 
(e.g. select all temperature readings with type="bla")

However as a guideline, I'd suggest that all the readings for a given 
metric have the *same* set of labels, and they should all be non-empty 
(since an empty label is exactly the same as an absent label). That is: if 
you decide to categorise your temperature readings with two labels, say foo 
and bar, then every temperature reading should have foo="..." and 
bar="...". 

It *can* work if you don't follow that rule, but it's much easier to make 
mistakes especially with join queries, so don't make your life harder than 
it needs to be.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b0e9c306-99d0-4fa1-a934-bb43673c433fn%40googlegroups.com.

Reply via email to