Hi,

We are running into issues where GC pause will result into Taskmanagers being 
marked dead incorrectly.
Flink 
documentation<https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#distributed-coordination-via-akka>
 documents some knobs of Akka configurations to play around.

Focusing on “akka.watch.heartbeat.pause”, it mentions “Higher value increases 
the time to detect a dead TaskManager”

Can someone please help me understand the downside of increasing the time to 
detect a dead taskmanager?
Will this affect the fault tolerance guarantees / state management/ 
checkpointing?

Thanks,
Abhinav


Reply via email to