As a small operator, we mainly use Icinga for the reasons Chuck mentioned. The API allows us to do updates based on configuration parameters we've created in a custom MySQL database.
Peter Peter Harrison CTO, Colovore LLC On Wed, Aug 15, 2018 at 9:19 AM, Chuck Anderson <c...@wpi.edu> wrote: > On Wed, Aug 15, 2018 at 08:49:12AM -0500, Colton Conor wrote: > > We are looking for a new network monitoring system. Since there are so > many > > operators on this list, I would like to know which NMS do you use and > why? > > Is there one that you really like, and others that you hate? > > > > For free options (opensouce), LibreNMS and NetXMS come highly recommended > > by many wireless ISPs on low budgets. However, I am not sure the > commercial > > options available nor their price points. > > For monitoring network device/interface data plane reachability with > ping, we are still using an ancient piece of open source software > called Autostatus. I find it invaluable for notifying us about > reachability issues with it's simple to understand parent/child > relationships and graph-based fping methodology. It isn't perfect--it > doesn't scale very well, it doesn't have HA/clustering, it has no > fancy dependencies (just basic parent-child) and no event correlation, > no contact scheduling, no API, etc. but it is very easy to understand > why you are getting an alert or not and boiling that down to a single > point of failure and as such it provides reliable, trustable > information about data plane reachability from one vantage point on > the network. > > For monitoring server & network service availability, > device/environmental health, etc. we are currently using Nagios. My > problems with it are that it has complex rules for how/when to perform > a specific health check and send or suppress a notification (and > perhaps bugs in our old version that never ever seems to send any Host > notifications except when it does) and the whole idea of "suppress the > Host check unless all Service checks for all services on the host are > down" doesn't really fit well with the idea of monitoring > device/interface reachability on routers & switches that make up a > complex graph of dependencies. Trying to shoehorn Nagios into > alerting on just the one IP address/device/interface that is causing > all the others behind it to be unreachable doesn't work very well. > You can't use Host Depenencies because Host checks are suppressed by > default, and Host Dependencies don't affect Service > Checks/notifications. Forcing Host checks to always run causes > performance problems. Creating a "Ping" service for every host > requires creating manual Service Dependencies between all the "Ping" > services on every Host. Then you end up with a complex configuration > that is very hard to understand. But for things like telling you when > a power supply or fan has died, or if the web service crashed, it > works well. > > We did a survey of a bunch of open source tools to replace Nagios and > have settled on Icinga for it's APIs, dynamic rules with pattern > matching and boolean logic, and compatibility with Nagios plugins. > But it still doesn't change the basic architectural choices of the > Nagios core engine and hence isn't a good fit for network > device/interface reachability monitoring IMO. >