On Tue, Oct 13, 2009 at 1:19 PM, Rob Cherry <lo...@lxrb.com> wrote: > I am in a situation of starting a site from scratch and thus have no > historical helpful configurations to build from. I have just finished > implementing a monitoring solution and now I need to tell it what I care > about. There is obvious stuff like availability and response times, but > other metrics have become more tricky.
I just put together a simple list for friend. Of course, what you monitor depends on what you want to know. In this case, I want to know if there is a problem I need to take care of so that I don't need to drive in to work an hour later. -- I have centralized remote monitoring for most of my systems. I trigger an alert of three consecutive values are out of range (checked every 60s); these are triggers that indicate an immediate problem. ICMP host unreachable CPU process queue, >3 CPU idle, <10% Free Swap space, <10MB Free Disk Space, <20% Available memory, <10MB Network queue, >5 Network errors, >3 *process not running *process TCP port unreachable I keep a history of those, plus several more metrics for capacity planning. CPU system/user/wait time Network traffic in/out/total Disk read/write/queue number of processes CPU temperature UPS load -- Perfection is just a word I use occasionally with mustard. --Atom Powers-- _______________________________________________ Discuss mailing list Discuss@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/