An infrastructure (servers, storage, desktops/workstations/laptops, network, etc.) is worth nothing in itself, if not providing proper application response time or QoE. I have since decided to take apart (identify all components making ~) all critical business apps and then decide what and how to monitor the underlying systems/processes/components (to the best of my ability and tools availability) making up the transactions "paths" end-to-end (I use the term "transaction" loosely - VoIP is among those apps requiring infrastructure monitoring, of course). I would thus answer your question by saying that "it depends" completely on what you are supporting ... and here is where I would start:
http://www.netqos.com/resourceroom/whitepapers/forms/handbook.html ***Stefan Mititelu http://twitter.com/netfortius http://www.linkedin.com/in/netfortius On Tue, Oct 13, 2009 at 3:28 PM, Atom Powers <atom.pow...@gmail.com> wrote: > On Tue, Oct 13, 2009 at 1:19 PM, Rob Cherry <lo...@lxrb.com> wrote: > > I am in a situation of starting a site from scratch and thus have no > > historical helpful configurations to build from. I have just finished > > implementing a monitoring solution and now I need to tell it what I care > > about. There is obvious stuff like availability and response times, but > > other metrics have become more tricky. > > I just put together a simple list for friend. Of course, what you > monitor depends on what you want to know. In this case, I want to know > if there is a problem I need to take care of so that I don't need to > drive in to work an hour later. > -- > I have centralized remote monitoring for most of my systems. I trigger > an alert of three consecutive values are out of range (checked every > 60s); these are triggers that indicate an immediate problem. > ICMP host unreachable > CPU process queue, >3 > CPU idle, <10% > Free Swap space, <10MB > Free Disk Space, <20% > Available memory, <10MB > Network queue, >5 > Network errors, >3 > *process not running > *process TCP port unreachable > > I keep a history of those, plus several more metrics for capacity planning. > CPU system/user/wait time > Network traffic in/out/total > Disk read/write/queue > number of processes > CPU temperature > UPS load > > > -- > Perfection is just a word I use occasionally with mustard. > --Atom Powers-- > > _______________________________________________ > Discuss mailing list > Discuss@lopsa.org > http://lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ >
_______________________________________________ Discuss mailing list Discuss@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/