Re: [prometheus-users] can node_exporter expose aggregated node_cpu_seconds_total?

Ben Kochie Thu, 02 Feb 2023 01:20:58 -0800

The node_exporter exposes per-cpu metrics because that's what most users
want. Knowing about per-core saturation, single-core IO wait, etc are
extremely useful and common use cases.


Using a recording rule is recommended.

On Thu, Feb 2, 2023 at 10:05 AM koly li <[email protected]> wrote:

> If using a recording rule to aggerate data, then I have to store both the
> per core samples and metric samples in the same prometheus, which costs
> lots of memory.
>
> After some investigation on node_exporter sourcecode, I found:
> 1. updateStat(cpu_linux.go
> <https://github.com/prometheus/node_exporter/blob/master/collector/cpu_linux.go#L316>)
> function reads the content of /proc/stat file and generate the
> node_cpu_seconds_total samples per core
> 2. updateStat function calls c.fs.Stat() to read and parse the content of
> /proc/stat file
> 3. fs.Stat() function parse the /proc/stat file and store the cpu total
> statics to Stat.CPUTotal（stat.go
> <https://github.com/prometheus/procfs/blob/master/stat.go#L63>）
> 4. However, updateStat function ignores the Stat.CPUTotal, it only uses
> the stats.CPU which contains info per core
>
> so, the question is why node_exporter developers don't use the CPUTotal to
> expose a total cpu statics? Should the new metrics about total usage
> statics be added to node-exporter？
>
>
> On Thursday, February 2, 2023 at 2:40:34 PM UTC+8 Stuart Clark wrote:
> On 02/02/2023 06:26, koly li wrote:
> Hi,
>
> Currently, node_exporter exposes time series for each cpu core (an example
> below), which generates a lot of data in a large cluster (10k nodes
> cluster). However, we only care about total cpu usage instead of usage per
> core. So is there a way for node_exporter to only
> expose aggregated node_cpu_seconds_total?
>
> we also notice there is an discussion here (reduce cardinality of
> node_cpu_seconds_total
> <https://groups.google.com/g/prometheus-developers/c/tvPCYZYHOYc>), but
> it seems no conclusion.
>
>
> node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="system",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 9077.24 1675059665571
>
> node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="user",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 19298.57 1675059665571
>
> node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="idle",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 1.060892164e+07 1675059665571
>
> node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
> 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="iowait",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
> 4.37 1675059665571
>
> You can't remove it as far as I'm aware, but you can use a recording rule
> to aggregate that data to just give you a metric that represents the
> overall CPU usage (not broken down by core/status).
> -- Stuart Clark
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/bc11f812-92b3-4b2d-81f8-e0720adc7510n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/bc11f812-92b3-4b2d-81f8-e0720adc7510n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmrHE7qvB%2Be8_AozxGCosHc7f2sAVk-g_D%2B9U7Q0FF4kfg%40mail.gmail.com.

Re: [prometheus-users] can node_exporter expose aggregated node_cpu_seconds_total?

Reply via email to