Re: high load ?

Yedidyah Bar-David Wed, 30 Mar 2005 02:54:34 -0800

On Wed, Mar 30, 2005 at 10:36:07AM +0200, Shachar Shemesh wrote:
> Yedidyah Bar-David wrote:
> 
> >>You may have to tweak the numbers a bit, but it seems about right. A 
> >>different question is whether, under this scenario, the load average is 
> >>still the right metric to look at? I think it is. If the load average is 
> >>2, my shell still have quite a queue to wait for being actively 
> >>processed, and the responsiveness will depend on the time slices alloted 
> >>to each process.
> >>   
> >>
> >
> >This is so only if you have load only on the CPU.
> >
> I'm sorry, I don't see how the number of CPUs enter into it. I'm not 
> sure whether the load average should be divided by the number of CPU's 
> or not in order to get a consistent number, but that's besides the 
> point. I was merely trying to show a situation in which the CPU is 
> relatively idle, but the load average is high, and then ask whether we 
> are still more interested in the load or the idle time.


I never ever talked in this thread about number of CPUs.

> 
> >If, for example, you
> >only have one process running, but which does a lot of paging, your
> >load average will be <=1, but the responsiveness will be quite bad,
> >as your shell will probably be paged out when you press a key.
> > 
> >
> Not if I repeatedly hit keys. That said, obviously both load average and 

Even if you repeatedly hit keys, you might have bad responsiveness due
to thrashing. The VM subsystem did get better in 2.2->2.4->2.6, but
still can't be perfect.

> CPU state are CPU metrics. Total system performance is affected by other 
> factors, such as disk speed, memory usage, network latency etc. These 
> are not reflected in the load average at all. If a process is waiting 
> for disk or network it won't be in the "Ready" queue. I'm not sure about 
> swap though. As for CPU stats, it may be somewhat reflected in the 
> amount of time the CPU spent in system vs. user, but not in the idle count.
> 
> >OTOH, today's CPU are very fast, and in most cases, if you run a few
> >'for (;;);' in the background, an interactive shell user won't notice,
> >while the load average will equal the number of such loops.
> > 
> >
> What does CPU speed have to do with it? A faster CPU will perform many 
> more no-op loops than a slow one, but would still take 100% CPU. The 
> scheduler today may be smarter and give my shell a higher priority due 
> to its interactiveness, but that, again, has nothing to do with the 
> CPU's speed. It seems to me that the only thing related to speed 
> affecting the perceived responsiveness of my shell is going to be the 
> time slice alloted to each running process.

IIRC you can change the time slice independent of CPU speed or
architecture. It's true that some architectures define it to be
smaller by default. But I don't think that's the point. The main
thing you get with a fast CPU is that context switches are faster.

> 
> >Much more annoying for our own users is amd mounts that sometimes take
> >a lot of time or stuck. I know this isn't kernel-related, but it does
> >hurt a lot responsiveness.
> > 
> >
> A. See above for non-CPU related problems.
> B. I know that the NFS code used to have TONS of locks held for far far 
> far too long. Any network latency spewed processes in "D" state by the 
> bucket load. That is very much kernel related. I don't know how things 
> are today.

I did not check thoroughly, but I do have a feeling things are getting
better over time. But other things changed here besides kernel versions.

> 
> >I personally do not know any single number that can tell on unix what
> >the expected responsiveness is.
> > 
> >
> RTFM time(1)

time doesn't measure responsiveness, but throughput. IIRC people did
write benchmarks that measure responsiveness, and I agree that they
give you, in the bottom line, a single number, but they are (probably)
heavy to run. They are benchmarks. What I meant was to say that among
the things that unix counts all the time (and therefore are cheap to
count), no single one is enough to tell how responsive the system is.
load average is the common number for this task, and is in many times
good enough, but as I said, not always. In general you do have to look
at vmstat, cpu state(s), etc. to get a picture of what's going on.
-- 
Didi


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: high load ?

Reply via email to