[ https://issues.apache.org/jira/browse/FLINK-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685582#comment-16685582 ]
ASF GitHub Bot commented on FLINK-10774: ---------------------------------------- stevenzwu edited a comment on issue #7020: [FLINK-10774] [Kafka] connection leak when partition discovery is disabled an… URL: https://github.com/apache/flink/pull/7020#issuecomment-438384937 @tillrohrmann > shouldn't we close the partitionDiscoverer in the open method in case of a failure. Moreover, we could also close it there in the case if automatic partition discovery is disabled. right now, the if-else check of partition discovery is done in `run` method to decide if we need to close the `partitionDiscoverer` before `runFetchLoop`. I didn't want to change that, unless we want to move the starting of `discoveryLoopThread` into open method as well. is that what you have in mind? I was thinking `cancel` method as the catch/finally block in Java. Plus it was the place where we close `partitionDiscoverer` for the enabled case. I though it might makes sense to ensure the cleanup in `cancel` method for both disabled and enabled cases > in line FlinkKafkaConsumerBase.java:721 fails with an exception? line 721 is for the partition discovery enabled case, `partitionDiscoverer` is closed in the `cancel` method in line 748 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > connection leak when partition discovery is disabled and open throws exception > ------------------------------------------------------------------------------ > > Key: FLINK-10774 > URL: https://issues.apache.org/jira/browse/FLINK-10774 > Project: Flink > Issue Type: Bug > Components: Kafka Connector > Affects Versions: 1.4.2, 1.5.5, 1.6.2 > Reporter: Steven Zhen Wu > Assignee: Steven Zhen Wu > Priority: Major > Labels: pull-request-available > > Here is the scenario to reproduce the issue > * partition discovery is disabled > * open method throws an exception (e.g. when broker SSL authorization denies > request) > In this scenario, run method won't be executed. As a result, > _partitionDiscoverer.close()_ won't be called. that caused the connection > leak, because KafkaConsumer is initialized but not closed. That has caused > outage that brought down our Kafka cluster, when a high-parallelism job got > into a restart/failure loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)