I've got an icinga 1.8 installation that has the following latencies: Active Service Latency: 0.000 / 1390.955 / 1077.921 sec Active Host Latency: 0.000 / 1383.793 / 1351.764 sec
633 hosts. 3,800 active service checks. The server is Redhat linux as a Virtual Machine on a hypervisor. Yes, I know we shouldn't run Icinga on a Vm, but before you assume this is the problem, note that there are no bottlenecks whatsoever on the server. No bottleneck in I/O, memory, or CPU, and as you can see from vmstat the machine doesn't even steal time from a virtual machine: :/usr/local/icinga/etc# vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 8536 1065916 457056 4057120 0 0 0 25 0 0 6 26 68 0 0 1 0 8536 1077524 457056 4057080 0 0 0 288 3722 1286 10 7 83 0 0 1 0 8536 1074784 457056 4059056 0 0 0 0 5255 6933 15 8 77 0 0 CPU is healthy (idle most of the time), load average is low: /usr/local/icinga/etc# uptime 10:19:45 up 254 days, 9:24, 4 users, load average: 1.65, 1.64, 1.61 Plenty of RAM: :/usr/local/icinga/etc# free total used free shared buffers cached Mem: 8028216 6944252 1083964 0 457092 4068612 -/+ buffers/cache: 2418548 5609668 Swap: 4194296 8536 4185760 ----- I have been trying unsuccessfully to reduce this latency so that all checks are completed within roughly a 5-minute timeframe, but I have been unable to get anything better than the 1000 second latencies seen above. Things I have done: * Status.dat, object.cache and spool/checkresults is on a ramdisk * Running icinga with renice -15 * Pertinent icinga.cfg lines are as follows: status_update_interval=14 use_syslog=0 service_inter_check_delay_method=n max_service_check_spread=5 service_interleave_factor=633 # number of hosts monitored host_inter_check_delay_method=n max_host_check_spread=5 max_concurrent_checks=501 check_result_reaper_frequency=3 max_check_result_reaper_time=20 cached_host_check_horizon=60 cached_service_check_horizon=60 sleep_time=0.02 check_service_freshness=1 service_freshness_check_interval=60 check_host_freshness=1 host_freshness_check_interval=60 service_check_timeout=80 host_check_timeout=30 event_handler_timeout=90 notification_timeout=120 ocsp_timeout=7 perfdata_timeout=5 enable_embedded_perl=0 use_embedded_perl_implicitly=0 use_retained_program_state=1 use_retained_scheduling_info=0 The above is the current configuration with the 1000s latencies. For my debug_level I'm using 16|8|64. I'm not seeing anything useful in the debug file, perhaps because I don't know what I'm looking for, or perhaps because there is nothing informative regarding performance. In my log file, there are no "orphaned" checks. In the past 10 hours there have been a handful of warnings about "Breaking out of check result reaper", but nothing else to indicate performance issues. So, I have no idea what's going on and why I'm not getting better latencies. Any thoughts?
_______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org https://lists.icinga.org/mailman/listinfo/icinga-users