Nico Kruber created FLINK-10455: ----------------------------------- Summary: Potential Kafka producer leak in case of failures Key: FLINK-10455 URL: https://issues.apache.org/jira/browse/FLINK-10455 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.5.2 Reporter: Nico Kruber
If the Kafka brokers' timeout is too low for our checkpoint interval [1], we may get an {{ProducerFencedException}}. Documentation around {{ProducerFencedException}} explicitly states that we should close the producer after encountering it. By looking at the code, it doesn't seem like this is actually done in {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an exception, we don't clean up (nor try to commit) any other transaction. -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} simply iterates over the {{pendingCommitTransactions}} which is not touched during {{close()}} Now if we restart the failing job on the same Flink cluster, any resources from the previous attempt will still linger around. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 -- This message was sent by Atlassian JIRA (v7.6.3#76005)