I can second Zabbix. We use it in our current setup 100+ servers, works OK. Also you can take Nagios. or one of the clones One of previous my previous monitoring solutions had 10000+ specialized requests/hour with help of custom scripts in perl & C.
One thing to consider. Most of monitoring solutioms use Round-Robin Database (RRD) as backend storage for time-series data. If you'll need fine granularity for "old" (beginning with minuts/hours) data avoid those setups. https://en.wikipedia.org/wiki/Round-Robin_Database Regards, Evgeniy. On Mon, Jun 16, 2014 at 10:47 AM, Rabin Yasharzadehe <ra...@rabin.io> wrote: > I can recommend Zabbix, I was never used it on a large network (~30 server > most), but i was happy with it. > > - you can set the monitoring interval for each item (from 1s -> days) > - samples are stored in the DB, and graphs are plotted only when you need > them > - have a build in support for SMS and Jabber message alerts. > - works with agent, but also works with SNMP and scripts you can writes. > > note that you'll need to provide enough storage for it. > (i think they have the formula or a calculator in there website, which you > can use to calculate the storage you'll need ) > > > *--Rabin* > > > On Mon, Jun 16, 2014 at 2:12 AM, Ori Berger <linux...@orib.net> wrote: > >> I'm looking for a single system that can track all of a remote server's >> health and performance status, and which stores a detailed >> every-few-seconds history. So far, I haven't found one comprehensive system >> that does it all; also, triggering alarms in "bad" situations (such as no >> disk space, etc). Things I'm interested in (in parentheses - how I track >> them at the moment. Note shinken is a nagios-compatible thing). >> >> Free disk space (shinken) >> Server load (shinken) >> Debian package and security updates (shinken) >> NTP drift (shinken) >> Service ping/reply time (shinken) >> Upload/download rates per interface (mrtg) >> Temperatures (sensord, hddtemp) >> Security logs, warning and alerts e.g. fail2ban, auth.log (rsync of log >> files) >> >> I have a few tens of servers to monitor, which I would like to do with >> one software and one console. Those servers are not all physically on the >> same network, nor do they have a VPN (so, no UDP) but tcp and ssh are >> mostly reliable even though they are low bandwidth. >> >> Please note that shinken (much like nagios) doesn't really give a good >> visible history of things it measures - only alerts; Also, it can't really >> sample things every few seconds - the lowest reasonable update interval >> (given shinken's architecture) is ~5 minutes for the things it measures >> above. >> >> Any recommendations? >> >> Thanks in advance, >> Ori >> >> _______________________________________________ >> Linux-il mailing list >> Linux-il@cs.huji.ac.il >> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il >> > > > _______________________________________________ > Linux-il mailing list > Linux-il@cs.huji.ac.il > http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il > > -- So long, and thanks for all the fish.
_______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il