Re: How can this 'top' command output make sense? Load over 7 and total CPU use ~5%

Matthew Seaman Sun, 24 May 2009 00:47:16 -0700

Yuri wrote:

Look below: load over 7 and no processes take much CPU.
Yuri

7.2-PRERELEASE, 32-bit on i7-920.



------------------------------------------------------------
last pid: 93192; load averages: 7.68, 6.27, 4.61 up 2+03:11:29 20:25:24
204 processes: 9 running, 193 sleeping, 1 stopped, 1 zombie
CPU:  5.3% user,  0.0% nice,  0.0% system,  0.0% interrupt, 94.7% idle
Mem: 867M Active, 1684M Inact, 279M Wired, 65M Cache, 112M Buf, 92M Free
Swap: 16G Total, 142M Used, 16G Free

 PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
60032 yuri          1  46    0   285M   183M select 0  41:15  0.59% Xorg
60400 yuri 1 4 0 12576K 9144K kqread 4 29:44 0.00% wineserver 92982 yuri 1 44 0 53012K 16800K CPU3 3 18:50 0.00% kdeinit4 92986 yuri 1 44 0 53012K 16800K CPU7 7 18:48 0.00% kdeinit4 92988 yuri 1 107 0 53012K 16840K CPU6 6 17:22 0.00% kdeinit4
60104 yuri          1  44    0   132M 45860K select 0  16:58  0.00% kwin
92984 yuri 1 117 0 53012K 16800K RUN 5 14:56 0.00% kdeinit4
60096 yuri          1  44    0 89732K 30040K select 4  10:10  0.00% kded4
93141 yuri 1 53 0 53012K 16800K CPU5 5 3:52 0.00% kdeinit4 93139 yuri 1 44 0 53012K 16800K CPU1 1 3:30 0.00% kdeinit4 60174 yuri 1 44 0 3168K 1400K select 0 1:28 0.00% ksysguardd
 450 root          1   4    0  3128K   800K select 4   0:44  0.00% dhclient
1131 messagebus 1 4 0 3344K 1384K select 4 0:40 0.00% dbus-daemon


Sure. This is not an uncommon occurrence really.  The load average is
the number of processes in the queue for a CPU time slice averaged over
5, 10 or 15 minutes.  For multi-core systems the LA is scaled by the number
of cores so a LA of 1.0 means all cores have active processes pretty much
continually.

Now, you might think that an active process will take the CPU utilisation
to 100%, but that is not necessarily so.  Some numerical applications can
do that, but purely CPU bound processes are relatively uncommon in everyday
usage.  In actuality what happens is that the processor will need to retrieve
data from somewhere to operate on.  There's a hierarchy of data stores of
various speeds (latency, rather than bandwidth):

  L1 Cache > L2 Cache > L3 Cache > Main RAM > Disk > Network

Where the L1 Cache is accessible in a few clock ticks (nanoseconds), Main RAM can take microseconds to access, disk can take milliseconds to access,

and Network can take 10 -- 1000s of milliseconds.

Or in other words, about 9 orders of magnitude difference.  So when the data
you need to process is too big to fit in the fastest caches, or when it comes
from a particularly slow location or when you have a lot of active processes
causing context switches, then the CPU core will be making frequent IO requests

and spending time waiting for them to be fulfilled.

Now, for sources like disks and network where the retrieval is much slower than
the typical timescale of events on the CPU the process will yield the CPU to
something else and only get a new timeslice once the IO request has been
fulfilled.  For an access to main RAM however that form of yielding is less
likely.  Consequently the CPU can end up waiting for 100s of clock cycles until
it gets some bytes to process.  In the mean time, other processes are also 
sitting
in the queue wanting CPU time slices -- hence the high LA with low CPU 
utilization.

Scheduling CPU timeslices to make maximum use of available resources is the
difference between a really performant OS and a disaster.  A good scheduler
is the critical central piece of code around which the rest of an OS can be 
constructed.  Combine that with the complexity of having multiple cores, and
that threads of execution sometimes have to be moved to different cores, and
on other occasions sometimes need to stick to the same core in order to make
best use of resources and you will start to appreciate quite how hard it is to
write a good scheduler.  Unsurprisingly, the design of such things is a matter
of fairly impassioned debate amongst the rarified circle of people capable of
writing them.  That sort of argument was the genesis of the FreeBSD / 
DragonflyBSD
fork a few years back.  You can rest assured though that FreeBSD certainly does
have one of the very best schedulers currently available and it is specifically
targeted at getting the best out of the sort of multicore CPUs available 
nowadays.

        Cheers,

        Matthew

--
Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                 Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
                                                 Kent, CT11 9PW

signature.asc
Description: OpenPGP digital signature

Re: How can this 'top' command output make sense? Load over 7 and total CPU use ~5%

Reply via email to