Package: atop
Version: 2.2.6-4
Severity: important

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

Context: Running a baremetal server with 512GB RAM + 10 SSDs disks + 16
Cores + MariaDB 10.1
This database is part of a farm, with other 2 hosts (which are running
Debian 8)

The running kernel is: 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3
(2018-03-02)

This host has been suffering mysterious connections errors happening
every 10 minutes.
During those errors, the following facts were observed:

* Connection errors (hosts not being able to connect to MySQL)
* Packet drops
* Multiple cores going to 100% usage every 10 minutes for some seconds.
* Same hosts on the farm (which receive the same traffic) running Debian
* 8 had no issues whatsoever

This is an ouput of mpstat -P ALL 1 during the errors (filtered to show
only the CPUs going to 100% of %USR:

02:20:04 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:05 PM   11  100.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00

02:20:05 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:06 PM    2    0.00    0.00  100.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00

02:20:07 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:08 PM    0    0.00    0.00   99.00    0.00    0.00    1.00    0.00
0.00    0.00    0.00

02:20:08 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:09 PM   13   95.00    0.00    2.00    0.00    0.00    0.00    0.00
0.00    0.00    3.00

02:20:09 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:10 PM   13  100.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00

02:20:10 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:11 PM    4  100.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00
02:20:11 PM   13  100.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00
02:20:12 PM   13  100.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00

02:20:12 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:13 PM   13  100.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    0.00

02:20:13 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
%guest  %gnice   %idle
02:20:14 PM   13   91.00    0.00    0.00    0.00    0.00    0.00    0.00
0.00    0.00    9.00

And this was captured during during each atop execution

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
 9476 root       0 -20   29232  12300   3540 R  35.1  0.0   5:44.59 atop

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
 9476 root       0 -20   29232  12300   3540 R  99.3  0.0   5:47.60 atop

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
 9476 root       0 -20   31680  14672   3540 S   5.0  0.0   5:47.75 atop

On Debian 9, atop is being started by default as:
/usr/bin/atop -a -R -w /var/log/atop/atop_20180418 600

Whereas on Debian 8 it is started by default as:
/usr/bin/atop -a -w /var/log/atop/atop_20180424 600

Debian 8 hosts had no issues.
The difference is the -R option, which as per the documentation, it
could be a really expensive operation.
Quoting documentation: 
Since gathering of all values that are needed to calculate the PSIZE is
a relatively time-consuming task, the 'R'  key  (or '-R' flag) should be
active. Gathering these values also requires superuser privileges
(otherwise '?K' is shown in the output).

Starting atop on Debian 9 without -R makes the host to have no problems.

This is a package change from Debian 8 to Debian 9. 
As this option could lead to serious issues, specially on high trafficked
systems, -R should not be enabled by default (as done on Debian 8), or
at least a confirmation/warning should be shown while installing the package

-- System Information:
Debian Release: 9.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-6-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages atop depends on:
ii  init-system-helpers  1.48
ii  libc6                2.24-11+deb9u3
ii  libncurses5          6.0+20161126-1+deb9u2
ii  libtinfo5            6.0+20161126-1+deb9u2
ii  lsb-base             9.20161125
ii  zlib1g               1:1.2.8.dfsg-5

Versions of packages atop recommends:
ii  cron [cron-daemon]  3.0pl1-128+deb9u1

atop suggests no packages.

-- no debconf information

Reply via email to