Hi, > On 19. Sep 2017, at 17:15, Jered Floyd <je...@convivian.com> wrote: > > > Michael, > > Excellent intuition! This looks very much like an issue with the > InfluxdbWriter queue. It looks like Icinga loses the connection and doesn't > attempt to reconnect, but queues up all the data indefinitely. TLS is > enabled, and the configuration is below. I'm guessing this is > https://github.com/Icinga/icinga2/issues/5469 ?
Yep, my question was to find out if TLS is used, so I could point you to the issue. 2.7.1 is going to be released tomorrow, if nothing else happens. Kind regards, Michael > > Regards, > --Jered > > > From the most recent instance (filtered to InfluxDB-related messages): > > [2017-09-18 10:13:11 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 0, rate: 6.03333/s (362/min 1814/5min 5446/15min); > [2017-09-18 10:18:21 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 0, rate: 6.01667/s (361/min 1813/5min 5443/15min); > [2017-09-18 10:23:31 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 0, rate: 5.98333/s (359/min 1814/5min 5441/15min); > [2017-09-18 10:26:58 -0400] warning/InfluxdbWriter: Response timeout of TCP > socket from host '127.0.0.1' port '8086'. > [2017-09-18 10:28:21 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 10, rate: 6.01667/s (361/min 1810/5min 5440/15min); > [2017-09-18 10:28:31 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 68, rate: 6.01667/s (361/min 1810/5min 5440/15min); empty in > 11 seconds > [2017-09-18 10:28:41 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 132, rate: 6.01667/s (361/min 1810/5min 5440/15min); empty > in 20 seconds > [2017-09-18 10:28:51 -0400] information/WorkQueue: #5 (InfluxdbWriter, > influxdb) items: 200, rate: 6.01667/s (361/min 1810/5min 5440/15min); empty > in 29 seconds > > ... and the queue keeps growing from there. There are no errors noted in the > InfluxDB logs. > > > /etc/icinga2/features-enabled/influxdb.conf: > > /** > * The InfluxdbWriter type writes check result metrics and > * performance data to an InfluxDB HTTP API > */ > > library "perfdata" > > object InfluxdbWriter "influxdb" { > host = "127.0.0.1" > port = 8086 > ssl_enable = true > database = "icinga2" > username = "icinga2" > password = "REDACTED" > > enable_send_thresholds = true > enable_send_metadata = true > > host_template = { > measurement = "$host.check_command$" > tags = { > hostname = "$host.name$" > } > } > service_template = { > measurement = "$service.check_command$" > tags = { > hostname = "$host.name$" > service = "$service.name$" > } > } > } > > > > > > ----- On Sep 19, 2017, at 9:02 AM, Michael Friedrich > michael.friedr...@icinga.com wrote: > >>> On 19. Sep 2017, at 14:51, Jered Floyd <je...@convivian.com> wrote: >>> >>> >>> Icinga Users, >>> >>> I'm running Icinga 2.7.0 on Debian 8.9 (Jessie), using the packages from the >>> official repository. >>> >>> I find that every few weeks Icinga uses up all of the available memory and >>> sub-processes are killed by the OOM-killer repeatedly. (It balloons from an >>> RSS of about 32M to 1GB+.) >>> >>> Data: >>> 1) I haven't yet been able to strongly correlate this with any causative >>> environmental factors. >> >> Any work queue metrics available which would show an increasing value >> (ido-mysql, influxdb, etc.). You can query that via API /v1/status endpoint, >> the “icinga” check or check that inside the logs. >> >>> >>> 2) When this occurs general monitoring continues but statistics are no >>> longer >>> written via the InfluxdbWriter. Not sure if this is cause or effect. >> >> Please share the configuration for InfluxdbWriter, especially whether TLS is >> enabled. >> >> Kind regards, >> Michael >> >> >>> >>> 3) It seems to happen quite rapidly, as the final check_memory logged to >>> InfluxDB shows 1.5 GB free, and a low memory alert is never triggered >>> within >>> Icinga. >>> >>> 4) There was a time when this problem did not exist (several months ago) >>> but I >>> cannot identify when specifically it started. >>> >>> Any suggestions on how to start debugging this issue? Unfortunately my >>> gdb-fu >>> is relatively weak.... >>> >>> Thanks, >>> --Jered >>> _______________________________________________ >>> icinga-users mailing list >>> icinga-users@lists.icinga.org >>> https://lists.icinga.org/mailman/listinfo/icinga-users >> >> _______________________________________________ >> icinga-users mailing list >> icinga-users@lists.icinga.org >> https://lists.icinga.org/mailman/listinfo/icinga-users > _______________________________________________ > icinga-users mailing list > icinga-users@lists.icinga.org > https://lists.icinga.org/mailman/listinfo/icinga-users _______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org https://lists.icinga.org/mailman/listinfo/icinga-users