I am in a situation of starting a site from scratch and thus have no
historical helpful configurations to build from.  I have just finished
implementing a monitoring solution and now I need to tell it what I care
about.  There is obvious stuff like availability and response times, but
other metrics have become more tricky.

My historical knowledge all seems increasingly irrelevant too.  For example,
checking free memory on a modern Solaris 10 box makes no sense - all the
memory is "stolen" by the kernel for aggressive ZFS caching and given up
when applications need it.

Given this and other considerations, I will kick off a list of what i intend
to monitor, but I would be very curious to know what everyone else is doing
and whether they agree/disagree with the list -


Solaris/Linux
 - / % usage
 - /tmp % usage
 - swap % usage
 - CPU load
 - Overall response times for various services such as http/ssh etc.

Windows
 - %SYSTEMROOT% % free
 - memory % free
 - CPU % free
 - Response times on well known ports

Cisco/Networking equipment
 - Internal temperatures
 - CPU/Memory utilization

UPS
 - Output load average
 - Battery % Charge
 - Minutes remaining / am i on battery

Any glaring omissions?

Rob
_______________________________________________
Discuss mailing list
Discuss@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to