On 6 June 2016 at 07:18, Manuel Marín <m...@transtelco.net> wrote: > Dear Nanog community > > We are currently planning to upgrade our monitoring system (Opsview) due to > scalability issues and I was wondering what do you recommend for monitoring > 5000 hosts and 35000 services. We would like to use a monitoring system > that is compatible with the nagios plugin format, however we are not sure > if systems like Icinga/Shinken/Op5 are the way to go. > > Is someone using systems like Op5 or Icinga2 for monitoring > 5000 hosts? > Would you recommend commercial systems like Sevone, Zabbix, etc instead of > open source ones? >
Although I haven't ever scaled it that high, I've had a lot of luck using Gearman (mod_gearman) to make Nagios horizontally scalable. It allows you to use Nagios itself only as a scheduler and reporting UI, and offload all of the actual probing to other servers. There'll be a theoretical limit to the amount of scale you get get out of that due to relying on a single Nagios instance to schedule checks and receive reports of success, but I imagine it's much higher than your current requirements.