[ 
https://issues.apache.org/jira/browse/KAFKA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Di Campo updated KAFKA-8165:
----------------------------
    Environment: 
Kafka 2.1, Kafka Streams 2.1
Amazon Linux, on Docker based on wurstmeister/kafka image

  was:Amazon Linux container, on Docker based on wurstmeister/kafka image.


> Streams task causes Out Of Memory after connection and store restoration
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-8165
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8165
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.1.0
>         Environment: Kafka 2.1, Kafka Streams 2.1
> Amazon Linux, on Docker based on wurstmeister/kafka image
>            Reporter: Di Campo
>            Priority: Major
>
> Having a Kafka Streams 2.1 application, when Kafka brokers are stable, the 
> (largely stateful) application has been consuming ~160 messages per second at 
> a sustained rate for several hours. 
> However it started having connection issues to the brokers. 
> {code:java}
> Connection to node 3 (/172.31.36.118:9092) could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> Also it began showing a lot of these errors: 
> {code:java}
> WARN [Consumer 
> clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-2-consumer,
>  groupId=stream-processor] 1 partitions have leader brokers without a 
> matching listener, including [broker-2-health-check-0] 
> (org.apache.kafka.clients.NetworkClient){code}
> In fact, the _health-check_ topic is in the broker but not consumed by this 
> topology or used in any way by the Streams application (it is just broker 
> healthcheck). It does not complain about topics that are actually consumed by 
> the topology. 
> Some time after these errors (that appear at a rate of 24 appearances per 
> second during ~5 minutes), then the following logs appear: 
> {code:java}
> [2019-03-27 15:14:47,709] WARN [Consumer 
> clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-1-restore-consumer,
>  groupId=] Connection to node -3 (/ip3:9092) could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> In between 6 and then 3 lines of "Connection could not be established" error 
> messages, 3 of these ones slipped in: 
> [2019-03-27 15:14:47,723] WARN Started Restoration of visitorCustomerStore 
> partition 15 total records to be restored 17 
> (com.divvit.dp.streams.applications.monitors.ConsoleGlobalRestoreListener)
>  
> ... one for each different KV store I have (I still have another KV that does 
> not appear, and a WindowedStore store that also does not appear). 
> Then I finally see "Restoration Complete" (using a logging 
> ConsoleGlobalRestoreListener as in docs) messages for all of my stores. So it 
> seems it may be fine now to restart the processing.
> Three minutes later, some events get processed, and I see an OOM error:  
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>  
> ... so given that it usually allows to process during hours under same 
> circumstances, I'm wondering whether there is some memory leak in the 
> connection resources or somewhere in the handling of this scenario.
> Kafka and KafkaStreams 2.1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to