On 01/18/2017 03:41 AM, Sam Varshavchik wrote:
> One of my servers was a bit "unresponsive". After waiting about 20
> seconds for an ssh connection, the root shell seemed fine, but top
> showed this:
> 
> top - 06:31:36 up 3 days, 21:37,  2 users,  load average: 6.00, 6.00, 6.00
> Tasks: 294 total,   1 running, 277 sleeping,   0 stopped,  16 zombie
> %Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si, 
> 0.0 st
> KiB Mem :  4045896 total,  1200824 free,   346588 used,  2498484 buff/cache
> KiB Swap:  2096112 total,  2093448 free,     2664 used.  3288268 avail Mem
> 
>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 22031 root      20   0  156776   4092   3480 R   0.9  0.1   0:00.04 top
>    1 root      20   0  147032   7660   5616 D   0.0  0.2   0:14.18 systemd
>    2 root      20   0       0      0      0 S   0.0  0.0   0:00.05 kthreadd
>    3 root      20   0       0      0      0 S   0.0  0.0   0:00.01
> ksoftirqd/0
> 
> 
> The load average was 6. But nothing was burning CPU.
> 
> After poking around, all signs were pointing to systemd doing what
> systemd does best:
> 
> # systemctl status
> Failed to read server status: Connection timed out
> 
> And the 16 zombie processes were system daemons, that should've been
> reaped by systemd.
> 
> "reboot" did nothing, of course. "reboot --force" did the trick.
> 
> Setting aside yet another systemd fiasco (on a mostly idle server that
> did absolutely nothing for the last ten hours) I'm curious as to how
> /proc/loadavg could end up reporting a load average of 6, without any
> processes being seeming to be doing anything.

Your systemd was in a D (I/O wait) state and any process in that state
will drive the load up very high. This can be caused by devices that
aren't responding (e.g. a boatload of disk I/O), a ton of interrupts or
possibly context switches. In your case it isn't interrupts (the "hi"
and "si" fields of top), so it's probably I/O or context switches. You
need to run something like "vmstat 5" to see I/O and context switches.
The "bi" (block in) and "bo" (block out) fields under "--io--" in the
vmstat output show disk I/O, the "cs" field under "--system--" shows the
context switches.

And yes, I agree...systemd is a spectacular failure.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ri...@alldigital.com -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
-   Let us think the unthinkable. Let us do the undoable. Let us     -
-   prepare to grapple with the ineffable itself, and see if we may  -
-                      not eff it up after all.                      -
-                                                 -- Douglas Adams   -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org

Reply via email to