[ https://issues.apache.org/jira/browse/KAFKA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Di Campo updated KAFKA-8165: ---------------------------- Environment: Kafka 2.1, Kafka Streams 2.1 Amazon Linux, on Docker based on wurstmeister/kafka image was:Amazon Linux container, on Docker based on wurstmeister/kafka image. > Streams task causes Out Of Memory after connection and store restoration > ------------------------------------------------------------------------ > > Key: KAFKA-8165 > URL: https://issues.apache.org/jira/browse/KAFKA-8165 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.1.0 > Environment: Kafka 2.1, Kafka Streams 2.1 > Amazon Linux, on Docker based on wurstmeister/kafka image > Reporter: Di Campo > Priority: Major > > Having a Kafka Streams 2.1 application, when Kafka brokers are stable, the > (largely stateful) application has been consuming ~160 messages per second at > a sustained rate for several hours. > However it started having connection issues to the brokers. > {code:java} > Connection to node 3 (/172.31.36.118:9092) could not be established. Broker > may not be available. (org.apache.kafka.clients.NetworkClient){code} > Also it began showing a lot of these errors: > {code:java} > WARN [Consumer > clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-2-consumer, > groupId=stream-processor] 1 partitions have leader brokers without a > matching listener, including [broker-2-health-check-0] > (org.apache.kafka.clients.NetworkClient){code} > In fact, the _health-check_ topic is in the broker but not consumed by this > topology or used in any way by the Streams application (it is just broker > healthcheck). It does not complain about topics that are actually consumed by > the topology. > Some time after these errors (that appear at a rate of 24 appearances per > second during ~5 minutes), then the following logs appear: > {code:java} > [2019-03-27 15:14:47,709] WARN [Consumer > clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-1-restore-consumer, > groupId=] Connection to node -3 (/ip3:9092) could not be established. Broker > may not be available. (org.apache.kafka.clients.NetworkClient){code} > In between 6 and then 3 lines of "Connection could not be established" error > messages, 3 of these ones slipped in: > [2019-03-27 15:14:47,723] WARN Started Restoration of visitorCustomerStore > partition 15 total records to be restored 17 > (com.divvit.dp.streams.applications.monitors.ConsoleGlobalRestoreListener) > > ... one for each different KV store I have (I still have another KV that does > not appear, and a WindowedStore store that also does not appear). > Then I finally see "Restoration Complete" (using a logging > ConsoleGlobalRestoreListener as in docs) messages for all of my stores. So it > seems it may be fine now to restart the processing. > Three minutes later, some events get processed, and I see an OOM error: > java.lang.OutOfMemoryError: GC overhead limit exceeded > > ... so given that it usually allows to process during hours under same > circumstances, I'm wondering whether there is some memory leak in the > connection resources or somewhere in the handling of this scenario. > Kafka and KafkaStreams 2.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)