I am involved with setting up NetSaint monitoring of a medium size network.
One problem I have is determining suitable ways of monitoring system load. A machine with 100% usage of a resource by server processes will have request queues that grow indefinately (and performance will suck). So the load average doesn't seem particularly useful. If a machine has a sustained load average of 3.0 from from CPU operations and it has two CPUs then that indicates a problem. If it is from disk operations and there are four disks in a RAID-5 array then it's equal to the number of non-parity stripes and the load is probably at the limit of what it can handle. If it's half from CPU and half from disk then it shouldn't be a problem at all. I think that perhaps a better way would be to have one test measure on the amount of CPU time used (the sum of the "user" and "system" percentages of the CPU usage as reported by top would do - nice time doesn't matter). Then I could have another test measure the disk utilization in terms of the await, svctm, or %util fields as reported by iostat. Any suggestions? -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]