On Tue, Oct 13, 2009 at 1:19 PM, Rob Cherry <lo...@lxrb.com> wrote:
> I am in a situation of starting a site from scratch and thus have no
> historical helpful configurations to build from.  I have just finished
> implementing a monitoring solution and now I need to tell it what I care
> about.  There is obvious stuff like availability and response times, but
> other metrics have become more tricky.

I just put together a simple list for friend. Of course, what you
monitor depends on what you want to know. In this case, I want to know
if there is a problem I need to take care of so that I don't need to
drive in to work an hour later.
--
I have centralized remote monitoring for most of my systems. I trigger
an alert of three consecutive values are out of range (checked every
60s); these are triggers that indicate an immediate problem.
ICMP host unreachable
CPU process queue, >3
CPU idle, <10%
Free Swap space, <10MB
Free Disk Space, <20%
Available memory, <10MB
Network queue, >5
Network errors, >3
*process not running
*process TCP port unreachable

I keep a history of those, plus several more metrics for capacity planning.
CPU system/user/wait time
Network traffic in/out/total
Disk read/write/queue
number of processes
CPU temperature
UPS load


-- 
Perfection is just a word I use occasionally with mustard.
--Atom Powers--

_______________________________________________
Discuss mailing list
Discuss@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to