Glen Barber wrote:
Hi, Matthew

On Sun, May 24, 2009 at 3:46 AM, Matthew Seaman
<m.sea...@infracaninophile.co.uk> wrote:
Yuri wrote:

[snip]

Sure. This is not an uncommon occurrence really.  The load average is
the number of processes in the queue for a CPU time slice averaged over
5, 10 or 15 minutes.  For multi-core systems the LA is scaled by the number
of cores so a LA of 1.0 means all cores have active processes pretty much
continually.


I thought, if it was a dual-core for example, a load average of 1.00
would indicate 50% CPU utilization overall (1 process using only 1
core)[1].  2.00 on a dual-core would be 100%, 3.00 on a dual-core
would be 100% utilization, and always 1 process in the wait queue, and
so on.

It seems both ways have been used in different OSes, which is confusing.
A quick test of a single threaded process that will spin one CPU on a
multi-core FreeBSD box shows the value is /not/ scaled by the number of cores.

Which means that the LA the OP was talking about is actually a lot less alarming
than it originally appears.  It's clear from the top output that his machine
has at least 8 cores, so a LA of 7 is really not very heavily loaded.

Now, you might think that an active process will take the CPU utilisation
to 100%, but that is not necessarily so.  Some numerical applications can
do that, but purely CPU bound processes are relatively uncommon in everyday
usage.  In actuality what happens is that the processor will need to
retrieve
data from somewhere to operate on.  There's a hierarchy of data stores of
various speeds (latency, rather than bandwidth):

 L1 Cache > L2 Cache > L3 Cache > Main RAM > Disk > Network


Does this affect the load average though?  My understanding was that
if the CPU cannot immediately process data, the data gets put into the
wait queue until L2 Cache (then RAM, etc, etc) returns the data to be
processed.

Yes it does: when a process is on the CPU and blocked waiting for IO
it does not necessarily yield the CPU to another process.  It depends on
timescales -- obviously if the CPU will have to wait milliseconds for data
it makes no sense to block other processes.  Waiting a few microseconds is
a different matter though: it might take that long to load up L2/L3 cache
with that processes' working data, so yielding the CPU for that sort of delay
would mean the process never got run, which is counter productive...  It
helps if the working set is already in the L3 cache -- so having the correct
amount[*] of cache RAM available is an important design criterion.  It's 
something
that Intel was shown to have got wrong with some of the Pentium series chips
when a low powered Pentium M designed for mobile use smoked a much higher
clock speed Pentium chip designed for all-out server use simply because it had
about 4x as much cache.

        Cheers,

        Matthew

[*] ie. as much as possible.

--
Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                 Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
                                                 Kent, CT11 9PW

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to