Thanks for posting this. I've been looking for more cases where people see these issues.
Linux does not do any mutex locking of the CPU metric counters. So we added some to the node_exporter in order to detect and mitigate spurious counter resets. In all of my testing and evidence, I've only see this happen on iowait data. But it's interesting that you see it on other events is useful. I don't think your docker use has any impact. I'd suspect it has more to do with the underlying server environment. Is this bare metal? VMs? What hypervisor? See the discussion and code in this issue: https://github.com/prometheus/node_exporter/issues/1686 On Mon, Jul 20, 2020 at 8:11 PM Bruce Merry <[email protected]> wrote: > We've recently upgraded a fleet of machines to both Ubuntu 18.04 (Linux > 5.3) and node-exporter to 1.0.1 (run under Docker). We're now seeing a lot > of messages like these: > > CPU Idle counter jumped backwards, possible hotplug event, resetting CPU > stats > CPU User counter jumped backwards > CPU System counter jumped backwards > CPU Iowait counter jumped backwards > CPU SoftIRQ counter jumped backwards > > I examined a handful of messages and the counter seems to go backwards by > a very small amount (0.01). Over the last 3 days, 74% of the messages are > from one machine and 24% from another, with multiple machines making up the > rest. > > I assume that this something odd in the system itself and that > node-exporter is just reporting what it sees. But any tips on what might > cause it / how to fix it, and whether it is worth worrying about? Is it > likely to be caused by running under Docker? > > Here's the uname and docker-compose.yml for completeness: > > Linux imgr1 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 > UTC 2020 x86_64 x86_64 x86_64 GNU/Linux > > version: '2' > services: > node-exporter: > image: prom/node-exporter:v1.0.1 > command: > - '--collector.textfile.directory=/textfile' > - '--no-collector.zfs' > - '--no-collector.wifi' > - '--no-collector.bcache' > - '--path.rootfs=/host' > - '--collector.vmstat.fields=^(oom_kill|pgpg|pswp|nr|pg.*fault).*' > - > '--collector.filesystem.ignored-fs-types=^(tmpfs|autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|fuse.lxcfs|nfs|nsfs|ceph|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$$' > network_mode: host > pid: host > volumes: > - /var/spool/node-exporter:/textfile:ro > - /:/host:ro,rslave > > Thanks > Bruce > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/2aac7b8b-8780-4a2f-973d-b0344720b876n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/2aac7b8b-8780-4a2f-973d-b0344720b876n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmoBsgQB%2Bcxb0FPvGfB5t1vTS%2BJ-ajxSM-Jbs1uQVR1G0w%40mail.gmail.com.

