Hello, I setup an distributed Icinga2 system with a master and multiple clients (actually multiple Debian kvm VMs on the same host). The master has endpoints with addresses of all the clients and the clients have endpoints with only the name of the master setup like this:
--- zones.conf on master --------------------------------------------------- object Endpoint "monitor.gnuviech.internal" { host = "10.0.0.25" } object Endpoint "ldap.gnuviech.internal" { host = "10.0.0.11" } object Endpoint "mq.gnuviech.internal" { host = "10.0.0.17" } object Zone "master" { endpoints = [ "monitor.gnuviech.internal" ] } object Zone "ldap.gnuviech.internal" { endpoints = [ "ldap.gnuviech.internal" ] parent = "master" } object Zone "mq.gnuviech.internal" { endpoints = [ "mq.gnuviech.internal" ] parent = "master" } object Zone "global-templates" { global = true } ---------------------------------------------------------------------------- --- zones.conf on client ldap ---------------------------------------------- object Endpoint "monitor.gnuviech.internal" { } object Endpoint "ldap.gnuviech.internal" { host = "10.0.0.11" } object Zone "master" { endpoints = [ "monitor.gnuviech.internal" ] } object Zone "ldap.gnuviech.internal" { endpoints = [ "ldap.gnuviech.internal" ] parent = "master" } object Zone "global-templates" { global = true } ---------------------------------------------------------------------------- This setup works fine initialy. The master connects to all the clients as expected and service checks are executed successfully. I maintain the /etc/icinga2/zones.d directory in a git repository and after fetching new configuration I reload the Icinga2 master. Unfortunatelly this seems to break the cluster communication. I have a service check --- cluster service check -------------------------------------------------- object Service "cluster" { check_command = "cluster" check_interval = 5s retry_interval = 1s host_name = "monitor.gnuviech.internal" } ---------------------------------------------------------------------------- that becomes critical after the reload. The count of disconnected clients is not always the same. The only way to get this sorted out is to stop the Icinga master, wait for some seconds and start the master again. systemctl restart icinga2 is not sufficient. The master log has entries like: [2017-07-13 12:13:17 +0200] information/JsonRpcConnection: Reconnecting to API endpoint 'mq.gnuviech.internal' via host '10.0.0.17' and port '5665' [2017-07-13 12:13:17 +0200] information/JsonRpcConnection: Reconnecting to API endpoint 'ldap.gnuviech.internal' via host '10.0.0.11' and port '5665' [2017-07-13 12:13:17 +0200] critical/TcpSocket: Invalid socket: Connection refused [2017-07-13 12:13:17 +0200] critical/TcpSocket: Invalid socket: Connection refused which seems strange to me because the icinga2 processes on these endpoints are not changed during the reload. I would assume that the master should just reconnect to them and continue with config synchronisation (for global-templates) and start sending check execution commands afterwards. Do you have any idea what might be wrong with my setup? Did I encounter a bug or is this some common misconfiguration? Why could the master get a Connection refused response? Best regards Jan Dittberner -- Jan Dittberner - Debian Developer GPG-key: 4096R/0xA73E0055558FB8DD 2009-05-10 B2FF 1D95 CE8F 7A22 DF4C F09B A73E 0055 558F B8DD https://jan.dittberner.info/
signature.asc
Description: PGP signature
_______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org https://lists.icinga.org/mailman/listinfo/icinga-users