I am in a situation of starting a site from scratch and thus have no historical helpful configurations to build from. I have just finished implementing a monitoring solution and now I need to tell it what I care about. There is obvious stuff like availability and response times, but other metrics have become more tricky.
My historical knowledge all seems increasingly irrelevant too. For example, checking free memory on a modern Solaris 10 box makes no sense - all the memory is "stolen" by the kernel for aggressive ZFS caching and given up when applications need it. Given this and other considerations, I will kick off a list of what i intend to monitor, but I would be very curious to know what everyone else is doing and whether they agree/disagree with the list - Solaris/Linux - / % usage - /tmp % usage - swap % usage - CPU load - Overall response times for various services such as http/ssh etc. Windows - %SYSTEMROOT% % free - memory % free - CPU % free - Response times on well known ports Cisco/Networking equipment - Internal temperatures - CPU/Memory utilization UPS - Output load average - Battery % Charge - Minutes remaining / am i on battery Any glaring omissions? Rob
_______________________________________________ Discuss mailing list Discuss@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/