Hi,

As I’m investigating onto Flink’s fault tolerance capabilities, I would like to 
know what component and class is in charge of TaskManager failure detection and 
checkpoint restoring? In addition, how does Flink actually determine that a 
TaskManager has failed due to e.g. hardware failures? 

Up to my knowledge, the state should be restored using the 
CheckpointCoordinator or ExecutionGraph. Correct me if I’m wrong. 

Thanks in advance,
Dominik

Reply via email to