Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2559 Actually, just discovered that the problem is different all together. While the KafkaConsumer is polling for new data (with a timeout), it holds the consumer lock. If no data comes in Kafka, the lock is not released before the poll timeout is over. During that time, neither a "commitSync" nor "commitAsync" call can be fired off. The `notifyCheckpointComplete` method hence blocks until the poll timeout is over and the lock is released. We can fix this by making sure that the consumer is "woken up" to release the lock, and by making sure the lock acquisition is fair, so the committer will get it next. For the sake of releasing the lock fast in the committer method, it should still be an asynchronous commit.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---