I'm looking for a single system that can track all of a remote server's health and performance status, and which stores a detailed every-few-seconds history. So far, I haven't found one comprehensive system that does it all; also, triggering alarms in "bad" situations (such as no disk space, etc). Things I'm interested in (in parentheses - how I track them at the moment. Note shinken is a nagios-compatible thing).

Free disk space (shinken)
Server load (shinken)
Debian package and security updates  (shinken)
NTP drift (shinken)
Service ping/reply time (shinken)
Upload/download rates per interface (mrtg)
Temperatures (sensord, hddtemp)
Security logs, warning and alerts e.g. fail2ban, auth.log (rsync of log files)

I have a few tens of servers to monitor, which I would like to do with one software and one console. Those servers are not all physically on the same network, nor do they have a VPN (so, no UDP) but tcp and ssh are mostly reliable even though they are low bandwidth.

Please note that shinken (much like nagios) doesn't really give a good visible history of things it measures - only alerts; Also, it can't really sample things every few seconds - the lowest reasonable update interval (given shinken's architecture) is ~5 minutes for the things it measures above.

Any recommendations?

Thanks in advance,
Ori

_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to