If using a recording rule to aggerate data, then I have to store both the per core samples and metric samples in the same prometheus, which costs lots of memory.
After some investigation on node_exporter sourcecode, I found: 1. updateStat(cpu_linux.go <https://github.com/prometheus/node_exporter/blob/master/collector/cpu_linux.go#L316>) function reads the content of /proc/stat file and generate the node_cpu_seconds_total samples per core 2. updateStat function calls c.fs.Stat() to read and parse the content of /proc/stat file 3. fs.Stat() function parse the /proc/stat file and store the cpu total statics to Stat.CPUTotal(stat.go <https://github.com/prometheus/procfs/blob/master/stat.go#L63>) 4. However, updateStat function ignores the Stat.CPUTotal, it only uses the stats.CPU which contains info per core so, the question is why node_exporter developers don't use the CPUTotal to expose a total cpu statics? Should the new metrics about total usage statics be added to node-exporter? On Thursday, February 2, 2023 at 2:40:34 PM UTC+8 Stuart Clark wrote: On 02/02/2023 06:26, koly li wrote: Hi, Currently, node_exporter exposes time series for each cpu core (an example below), which generates a lot of data in a large cluster (10k nodes cluster). However, we only care about total cpu usage instead of usage per core. So is there a way for node_exporter to only expose aggregated node_cpu_seconds_total? we also notice there is an discussion here (reduce cardinality of node_cpu_seconds_total <https://groups.google.com/g/prometheus-developers/c/tvPCYZYHOYc>), but it seems no conclusion. node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance=" 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="system",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"} 9077.24 1675059665571 node_cpu_seconds_total{container="node-exporter",cpu="85",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance=" 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="user",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"} 19298.57 1675059665571 node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance=" 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="idle",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"} 1.060892164e+07 1675059665571 node_cpu_seconds_total{container="node-exporter",cpu="86",endpoint="metrics",hostname="603k09311-9-bjsimu01",instance=" 10.253.108.171:9100",ip="10.253.108.171",job="node-exporter",mode="iowait",namespace="product-coc-monitor",pod="coc-monitor-prometheus-node-exporter-c2plp",service="coc-monitor-prometheus-node-exporter",prometheus="product-coc-monitor/coc-prometheus",prometheus_replica="prometheus-coc-prometheus-1"} 4.37 1675059665571 You can't remove it as far as I'm aware, but you can use a recording rule to aggregate that data to just give you a metric that represents the overall CPU usage (not broken down by core/status). -- Stuart Clark -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/bc11f812-92b3-4b2d-81f8-e0720adc7510n%40googlegroups.com.

