The Flink job recovered with wrong checkpoint state.

Thomas Huang Sun, 14 Jun 2020 07:59:15 -0700

Hi Flink Community,

Currently, I'm using yarn-cluster mode to submit flink job on yarn, and I 
haven't set high availability configuration (zookeeper), but set restart 
strategy:


 env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 3000))

the attempt time is 10 and the wait time 30 seconds per failure.

Today, when Infra team was rolling restart the yarn platform. Although the job 
manager restarted, the job hadn't recovered from the latest checkpoint, and all 
task managers started from the default job configuration that was not excepted.

Does it mean I have to setup high availability configuration for yarn-cluster 
mode, or Is there any bug?

Best Wish.

The Flink job recovered with wrong checkpoint state.

Reply via email to