On Sunday 21 July 2024 at 00:51:48 UTC+1 Christoph Anton Mitterer wrote: Hey.
On Sat, 2024-07-20 at 10:26 -0700, 'Brian Candler' via Prometheus Users wrote: > > If the label stays constant, then the amount of extra space required > is tiny. There is an internal mapping between a bag of labels and a > timeseries ID. Is it the same if one uses a metric (like for the RPMs from below) and that never changes? I mean is that also efficient? Yes: smartraid_physical_drive_rotational_speed_rpm 7200 smartraid_info{rpm="7200"} 1 are both static timeseries. Prometheus does delta compression; if you store the same value repeatedly the difference between adjacent points is zero. It doesn't matter if the timeseries value is 1 or 7200. > But if any label changes, that generates a completely new timeseries. > This is not something you want to happen too often (a.k.a "timeseries > churn"), but moderate amounts are OK. Why exactly wouldn't one want this? I mean especially with respect to such _info metrics. It's just a general consideration. When a timeseries churns you get new a new index entry, new head blocks etc. For info metrics which rarely change, it's fine. The limiting worst case is where you have a label value that changes every sample (for example, putting a timestamp in a label). Then every scrape generates a new timeseries containing one point. Have a few hundred thousand scrapes like that and your server will collapse. Graphing _info time series doesn't make sense anyway... so it's not as if one would get some usable time series/graph (like a temperature or so) interrupted, if e.g. the state changes for a while from OK to degraded. Indeed, and Grafana has a swim-lanes type view that works quite well for that. When a time series disappears, it goes "stale". But the good news is, for quite some time now, Prometheus has been automatically inserting staleness markers for a timeseries which existed in a previous scrape but not in the current scrape from the same job and target. Prior to that, timeseries would only go stale if there had been no data point ingested for 5 minutes, so it would be very unclear when the timeseries had actually vanished. I guess with appearing/disappearing you mean, that one has to take into account, that e.g. pd_info{state=="OK",pd_name="foo"} won't exist while "foo" is failed... and thus e.g. when graphing the OK-times of a device, it would per default show nothing during that time and not a value of zero? Yes. And it's a bit harder to alert on that condition, but you just have to approach it the right way. As you've realised, you can alert on the presence of a timeseries with a label not "OK", which is easier than alerting on the absence of a timeseries whose label is "OK". > The other option, if the state values are integer enumerations at > source (e.g. as from SNMP), is to store the raw numeric value: > > foo 3 > > That means the querier has to know how the meaning of these values. > (Grafana can map specific values to textual labels and/or colours > though). But that also requires me to use a label like in enum_metric{value=3}, No, I mean my_metric{other_labels="dontcare"} 3 An example is ifOperStatus in SNMP, where the meaning of values 1, 2, 3 ...etc is defined in the MIB. or I have to construct metric names dynamically (which I could also have done for the symbolic name), which seems however discouraged (and I'd say for good reasons)? Don't generate metric names dynamically. That's what labels are for. (In any case, the metric name is itself just a hidden label called "__name__") There is good advice at https://prometheus.io/docs/practices/naming/ I mean if both, label and metric, are equally efficient (in therms of storage)... then using a metric would have still the advantage of being able to do things like: smartraid_logical_drive_chunk_size_bytes > (256*1024) i.e. select those LDs, that use a chunk size > 256 KiB ... which I cannot (as easily) do if it's in a label. Correct. The flip side is if you want to see at a glance all the information about a logical volume, you'll need to look at a bunch of different metrics and associate them by some common label (e.g. a unique volume ID) Both approaches are valid. If you see a use case for the filtering or arithmetic, that pushes you down the path of separate metrics. If you're comparing a hundred static metrics versus a single metric with a hundred labels then I'd *guess* the single metric would be a bit more efficient in terms of storage and ingestion performance, but it's marginal and shouldn't really be a consideration: data is there to be used, so put it in whatever form allows you to make best use of it. You can look at other exporters for guidance. For example, node_exporter has node_md_* metrics for MD arrays. It provides a combined metric: node_md_info{ActiveDevices="10", ChunkSize="512K", ConsistencyPolicy="none" , CreationTime="Fri Feb 19 12:20:25 2021", FailedDevices="0", Layout= "-unknown-", Name="dar6:127 (local to host dar6)", Persistence="Superblock is persistent", RaidDevices="10", RaidLevel="raid0", SpareDevices="0", State ="clean ", TotalDevices="10", UUID="6c1f02c0:4ade9cee:17936d5f:1990e5db", Version="1.2", WorkingDevices="10", md_device="md127", md_metadata_version= "1.2", md_name="dar6:127", md_num_raid_disks="10", raid_level="0"} 1 But there are also separate metrics for: node_md_info_ActiveDevices node_md_info_ArraySize node_md_info_Events node_md_info_FailedDevices node_md_info_RaidDevices node_md_info_SpareDevices node_md_info_TotalDevices node_md_info_WorkingDevices Should I have made e.g. only one smartraid_temperature{type="bla"} value (or perhaps a bit more than just "type", with "bla" being e.g. controller, capacitor, cache_module or "sensor". I.e. putting all temperatures in one metric, rather than the 4 different ones I have now (and where I have no "type" label). Personally I'd make all "temperature readings" be one metric, with labels to distinguish them. It's more useful for graphing and aggregation. Having lots of different metrics is just harder to work with, in particular when it comes to making dashboards for them. It's easy to go the other way (e.g. select all temperature readings with type="bla") However as a guideline, I'd suggest that all the readings for a given metric have the *same* set of labels, and they should all be non-empty (since an empty label is exactly the same as an absent label). That is: if you decide to categorise your temperature readings with two labels, say foo and bar, then every temperature reading should have foo="..." and bar="...". It *can* work if you don't follow that rule, but it's much easier to make mistakes especially with join queries, so don't make your life harder than it needs to be. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b0e9c306-99d0-4fa1-a934-bb43673c433fn%40googlegroups.com.