[ https://issues.apache.org/jira/browse/FLINK-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532900#comment-15532900 ]
ASF GitHub Bot commented on FLINK-4702: --------------------------------------- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2559 Actually, just discovered that the problem is different all together. While the KafkaConsumer is polling for new data (with a timeout), it holds the consumer lock. If no data comes in Kafka, the lock is not released before the poll timeout is over. During that time, neither a "commitSync" nor "commitAsync" call can be fired off. The `notifyCheckpointComplete` method hence blocks until the poll timeout is over and the lock is released. We can fix this by making sure that the consumer is "woken up" to release the lock, and by making sure the lock acquisition is fair, so the committer will get it next. For the sake of releasing the lock fast in the committer method, it should still be an asynchronous commit. > Kafka consumer must commit offsets asynchronously > ------------------------------------------------- > > Key: FLINK-4702 > URL: https://issues.apache.org/jira/browse/FLINK-4702 > Project: Flink > Issue Type: Bug > Components: Kafka Connector > Affects Versions: 1.1.2 > Reporter: Stephan Ewen > Assignee: Stephan Ewen > Priority: Blocker > Fix For: 1.2.0, 1.1.3 > > > The offset commit calls to Kafka may occasionally take very long. > In that case, the {{notifyCheckpointComplete()}} method blocks for long and > the KafkaConsumer cannot make progress and cannot perform checkpoints. > Kafka 0.9+ have methods to commit asynchronously. > We should use those and make sure no more than one commit is concurrently in > progress, to that commit requests do not pile up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)