Rob Cherry wrote: > I am in a situation of starting a site from scratch and thus have no > historical helpful configurations to build from. I have just finished > implementing a monitoring solution and now I need to tell it what I > care about. There is obvious stuff like availability and response > times, but other metrics have become more tricky. > > My historical knowledge all seems increasingly irrelevant too. For > example, checking free memory on a modern Solaris 10 box makes no > sense - all the memory is "stolen" by the kernel for aggressive ZFS > caching and given up when applications need it. > > Given this and other considerations, I will kick off a list of what i > intend to monitor, but I would be very curious to know what everyone > else is doing and whether they agree/disagree with the list - > > > Solaris/Linux > - / % usage > - /tmp % usage > - swap % usage > - CPU load > - Overall response times for various services such as http/ssh etc. > > Windows > - %SYSTEMROOT% % free > - memory % free > - CPU % free > - Response times on well known ports > > Cisco/Networking equipment > - Internal temperatures > - CPU/Memory utilization > > UPS > - Output load average > - Battery % Charge > - Minutes remaining / am i on battery > > Any glaring omissions? > > Rob maybe not glaring, but if you're collecting host stats, I'd put a plug in for memory paging stats (page-in, page-out). If you're worried about swap usage on solaris, you might as well look for desparation free (de) in memory stats. Anything above 0 is considered bad. Also scan rate (sr).
cisco networking gear - why not collect interface stats like in/out octets and/or line usage (cisco has a private mib for that), also drops and errors per interface _______________________________________________ Discuss mailing list Discuss@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/