Hi, We are running into issues where GC pause will result into Taskmanagers being marked dead incorrectly. Flink documentation<https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#distributed-coordination-via-akka> documents some knobs of Akka configurations to play around.
Focusing on “akka.watch.heartbeat.pause”, it mentions “Higher value increases the time to detect a dead TaskManager” Can someone please help me understand the downside of increasing the time to detect a dead taskmanager? Will this affect the fault tolerance guarantees / state management/ checkpointing? Thanks, Abhinav