GitHub user shanthoosh opened a pull request: https://github.com/apache/samza/pull/420
SAMZA-1572: Add fixed retries on failure in KafkaCheckpointManager. KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop when an irrecoverable failure happens, this indefinitely blocks the commit phase (there by preventing processing). Added finite retries (50), which would retry for fixed time in case of failure before giving up. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shanthoosh/samza add_fixed_retries_in_kafka_checkpoint_manager Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #420 ---- commit 8b98814e9d96b17a5772d079c20832f6f094640e Author: Shanthoosh Venkataraman <svenkataraman@...> Date: 2018-01-25T22:10:28Z SAMZA-1572: Add fixed retries on failure in KafkaCheckpointManager. ---- ---