Hi,

> On 19. Sep 2017, at 17:15, Jered Floyd <je...@convivian.com> wrote:
> 
> 
> Michael,
> 
> Excellent intuition!  This looks very much like an issue with the 
> InfluxdbWriter queue.  It looks like Icinga loses the connection and doesn't 
> attempt to reconnect, but queues up all the data indefinitely.  TLS is 
> enabled, and the configuration is below.  I'm guessing this is 
> https://github.com/Icinga/icinga2/issues/5469 ?

Yep, my question was to find out if TLS is used, so I could point you to the 
issue. 2.7.1 is going to be released tomorrow, if nothing else happens.

Kind regards,
Michael

> 
> Regards,
> --Jered
> 
> 
> From the most recent instance (filtered to InfluxDB-related messages):
> 
> [2017-09-18 10:13:11 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 0, rate: 6.03333/s (362/min 1814/5min 5446/15min);
> [2017-09-18 10:18:21 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 0, rate: 6.01667/s (361/min 1813/5min 5443/15min);
> [2017-09-18 10:23:31 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 0, rate: 5.98333/s (359/min 1814/5min 5441/15min);
> [2017-09-18 10:26:58 -0400] warning/InfluxdbWriter: Response timeout of TCP 
> socket from host '127.0.0.1' port '8086'.
> [2017-09-18 10:28:21 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 10, rate: 6.01667/s (361/min 1810/5min 5440/15min);
> [2017-09-18 10:28:31 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 68, rate: 6.01667/s (361/min 1810/5min 5440/15min); empty in 
> 11 seconds
> [2017-09-18 10:28:41 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 132, rate: 6.01667/s (361/min 1810/5min 5440/15min); empty 
> in 20 seconds
> [2017-09-18 10:28:51 -0400] information/WorkQueue: #5 (InfluxdbWriter, 
> influxdb) items: 200, rate: 6.01667/s (361/min 1810/5min 5440/15min); empty 
> in 29 seconds
> 
> ... and the queue keeps growing from there.  There are no errors noted in the 
> InfluxDB logs.
> 
> 
> /etc/icinga2/features-enabled/influxdb.conf:
> 
> /**
> * The InfluxdbWriter type writes check result metrics and
> * performance data to an InfluxDB HTTP API
> */
> 
> library "perfdata"
> 
> object InfluxdbWriter "influxdb" {
>  host = "127.0.0.1"
>  port = 8086
>  ssl_enable = true
>  database = "icinga2"
>  username = "icinga2"
>  password = "REDACTED"
> 
>  enable_send_thresholds = true
>  enable_send_metadata = true
> 
>  host_template = {
>    measurement = "$host.check_command$"
>    tags = {
>      hostname = "$host.name$"
>    }
>  }
>  service_template = {
>    measurement = "$service.check_command$"
>    tags = {
>      hostname = "$host.name$"
>      service = "$service.name$"
>    }
>  }
> }
> 
> 
> 
> 
> 
> ----- On Sep 19, 2017, at 9:02 AM, Michael Friedrich 
> michael.friedr...@icinga.com wrote:
> 
>>> On 19. Sep 2017, at 14:51, Jered Floyd <je...@convivian.com> wrote:
>>> 
>>> 
>>> Icinga Users,
>>> 
>>> I'm running Icinga 2.7.0 on Debian 8.9 (Jessie), using the packages from the
>>> official repository.
>>> 
>>> I find that every few weeks Icinga uses up all of the available memory and
>>> sub-processes are killed by the OOM-killer repeatedly.  (It balloons from an
>>> RSS of about 32M to 1GB+.)
>>> 
>>> Data:
>>>  1) I haven't yet been able to strongly correlate this with any causative
>>>  environmental factors.
>> 
>> Any work queue metrics available which would show an increasing value
>> (ido-mysql, influxdb, etc.). You can query that via API /v1/status endpoint,
>> the “icinga” check or check that inside the logs.
>> 
>>> 
>>>  2) When this occurs general monitoring continues but statistics are no 
>>> longer
>>>  written via the InfluxdbWriter.  Not sure if this is cause or effect.
>> 
>> Please share the configuration for InfluxdbWriter, especially whether TLS is
>> enabled.
>> 
>> Kind regards,
>> Michael
>> 
>> 
>>> 
>>>  3) It seems to happen quite rapidly, as the final check_memory logged to
>>>  InfluxDB shows 1.5 GB free, and a low memory alert is never triggered 
>>> within
>>>  Icinga.
>>> 
>>>  4) There was a time when this problem did not exist (several months ago) 
>>> but I
>>>  cannot identify when specifically it started.
>>> 
>>> Any suggestions on how to start debugging this issue?  Unfortunately my 
>>> gdb-fu
>>> is relatively weak....
>>> 
>>> Thanks,
>>> --Jered
>>> _______________________________________________
>>> icinga-users mailing list
>>> icinga-users@lists.icinga.org
>>> https://lists.icinga.org/mailman/listinfo/icinga-users
>> 
>> _______________________________________________
>> icinga-users mailing list
>> icinga-users@lists.icinga.org
>> https://lists.icinga.org/mailman/listinfo/icinga-users
> _______________________________________________
> icinga-users mailing list
> icinga-users@lists.icinga.org
> https://lists.icinga.org/mailman/listinfo/icinga-users

_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to