Package: atop Version: 2.2.6-4 Severity: important Dear Maintainer,
*** Reporter, please consider answering these questions, where appropriate *** Context: Running a baremetal server with 512GB RAM + 10 SSDs disks + 16 Cores + MariaDB 10.1 This database is part of a farm, with other 2 hosts (which are running Debian 8) The running kernel is: 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) This host has been suffering mysterious connections errors happening every 10 minutes. During those errors, the following facts were observed: * Connection errors (hosts not being able to connect to MySQL) * Packet drops * Multiple cores going to 100% usage every 10 minutes for some seconds. * Same hosts on the farm (which receive the same traffic) running Debian * 8 had no issues whatsoever This is an ouput of mpstat -P ALL 1 during the errors (filtered to show only the CPUs going to 100% of %USR: 02:20:04 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:05 PM 11 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:05 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:06 PM 2 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:07 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:08 PM 0 0.00 0.00 99.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 02:20:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:09 PM 13 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 3.00 02:20:09 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:10 PM 13 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:10 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:11 PM 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:11 PM 13 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:12 PM 13 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:12 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:13 PM 13 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 02:20:13 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 02:20:14 PM 13 91.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9.00 And this was captured during during each atop execution PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9476 root 0 -20 29232 12300 3540 R 35.1 0.0 5:44.59 atop PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9476 root 0 -20 29232 12300 3540 R 99.3 0.0 5:47.60 atop PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9476 root 0 -20 31680 14672 3540 S 5.0 0.0 5:47.75 atop On Debian 9, atop is being started by default as: /usr/bin/atop -a -R -w /var/log/atop/atop_20180418 600 Whereas on Debian 8 it is started by default as: /usr/bin/atop -a -w /var/log/atop/atop_20180424 600 Debian 8 hosts had no issues. The difference is the -R option, which as per the documentation, it could be a really expensive operation. Quoting documentation: Since gathering of all values that are needed to calculate the PSIZE is a relatively time-consuming task, the 'R' key (or '-R' flag) should be active. Gathering these values also requires superuser privileges (otherwise '?K' is shown in the output). Starting atop on Debian 9 without -R makes the host to have no problems. This is a package change from Debian 8 to Debian 9. As this option could lead to serious issues, specially on high trafficked systems, -R should not be enabled by default (as done on Debian 8), or at least a confirmation/warning should be shown while installing the package -- System Information: Debian Release: 9.4 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-6-amd64 (SMP w/16 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages atop depends on: ii init-system-helpers 1.48 ii libc6 2.24-11+deb9u3 ii libncurses5 6.0+20161126-1+deb9u2 ii libtinfo5 6.0+20161126-1+deb9u2 ii lsb-base 9.20161125 ii zlib1g 1:1.2.8.dfsg-5 Versions of packages atop recommends: ii cron [cron-daemon] 3.0pl1-128+deb9u1 atop suggests no packages. -- no debconf information