Re: [prometheus-users] What causes "counter jumped backwards" in node-exporter

Ben Kochie Mon, 20 Jul 2020 11:25:01 -0700

Thanks for posting this. I've been looking for more cases where people see
these issues.


Linux does not do any mutex locking of the CPU metric counters. So we added
some to the node_exporter in order to detect and mitigate spurious counter
resets.

In all of my testing and evidence, I've only see this happen on iowait
data. But it's interesting that you see it on other events is useful. I
don't think your docker use has any impact. I'd suspect it has more to do
with the underlying server environment. Is this bare metal? VMs? What
hypervisor?

See the discussion and code in this issue:
https://github.com/prometheus/node_exporter/issues/1686



On Mon, Jul 20, 2020 at 8:11 PM Bruce Merry <[email protected]> wrote:

> We've recently upgraded a fleet of machines to both Ubuntu 18.04 (Linux
> 5.3) and node-exporter to 1.0.1 (run under Docker). We're now seeing a lot
> of messages like these:
>
> CPU Idle counter jumped backwards, possible hotplug event, resetting CPU
> stats
> CPU User counter jumped backwards
> CPU System counter jumped backwards
> CPU Iowait counter jumped backwards
> CPU SoftIRQ counter jumped backwards
>
> I examined a handful of messages and the counter seems to go backwards by
> a very small amount (0.01). Over the last 3 days, 74% of the messages are
> from one machine and 24% from another, with multiple machines making up the
> rest.
>
> I assume that this something odd in the system itself and that
> node-exporter is just reporting what it sees. But any tips on what might
> cause it / how to fix it, and whether it is worth worrying about? Is it
> likely to be caused by running under Docker?
>
> Here's the uname and docker-compose.yml for completeness:
>
> Linux imgr1 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> version: '2'
> services:
>   node-exporter:
>     image: prom/node-exporter:v1.0.1
>     command:
>       - '--collector.textfile.directory=/textfile'
>       - '--no-collector.zfs'
>       - '--no-collector.wifi'
>       - '--no-collector.bcache'
>       - '--path.rootfs=/host'
>       - '--collector.vmstat.fields=^(oom_kill|pgpg|pswp|nr|pg.*fault).*'
>       -
> '--collector.filesystem.ignored-fs-types=^(tmpfs|autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|fuse.lxcfs|nfs|nsfs|ceph|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$$'
>     network_mode: host
>     pid: host
>     volumes:
>       - /var/spool/node-exporter:/textfile:ro
>       - /:/host:ro,rslave
>
> Thanks
> Bruce
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/2aac7b8b-8780-4a2f-973d-b0344720b876n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/2aac7b8b-8780-4a2f-973d-b0344720b876n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoBsgQB%2Bcxb0FPvGfB5t1vTS%2BJ-ajxSM-Jbs1uQVR1G0w%40mail.gmail.com.

Re: [prometheus-users] What causes "counter jumped backwards" in node-exporter

Reply via email to