If using a recording rule to aggerate data, then I have to store both the 
per core samples and metric samples in the same prometheus, which costs 
lots of memory.

After some investigation on node_exporter sourcecode, I found:
1. updateStat(cpu_linux.go 
<https://github.com/prometheus/node_exporter/blob/master/collector/cpu_linux.go#L316>)
 
function reads the content of /proc/stat file and generate the 
node_cpu_seconds_total samples per core
2. updateStat function calls c.fs.Stat() to read and parse the content of 
/proc/stat file
3. fs.Stat() function parse the /proc/stat file and store the cpu total 
statics to Stat.CPUTotal(stat.go 
<https://github.com/prometheus/procfs/blob/master/stat.go#L63>)
4. However, updateStat function ignores the Stat.CPUTotal, it only uses the 
stats.CPU which contains info per core

so, the question is why node_exporter developers don't use the CPUTotal to 
expose a total cpu statics? Should the new metrics about total usage 
statics be added to node-exporter?


On Thursday, February 2, 2023 at 2:40:34 PM UTC+8 Stuart Clark wrote:
On 02/02/2023 06:26, koly li wrote:
Hi, 

Currently, node_exporter exposes time series for each cpu core (an example 
below), which generates a lot of data in a large cluster (10k nodes 
cluster). However, we only care about total cpu usage instead of usage per 
core. So is there a way for node_exporter to only 
expose aggregated node_cpu_seconds_total?

we also notice there is an discussion here (reduce cardinality of 
node_cpu_seconds_total 
<https://groups.google.com/g/prometheus-developers/c/tvPCYZYHOYc>), but it 
seems no conclusion.

node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="system",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
9077.24 1675059665571
node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="user",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
19298.57 1675059665571
node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="idle",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
1.060892164e+07 1675059665571
node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance="
10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="iowait",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"}
 
4.37 1675059665571

You can't remove it as far as I'm aware, but you can use a recording rule 
to aggregate that data to just give you a metric that represents the 
overall CPU usage (not broken down by core/status).
-- Stuart Clark 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bc11f812-92b3-4b2d-81f8-e0720adc7510n%40googlegroups.com.

Reply via email to