Re: Akka heartbeat configurations

Timo Walther Tue, 15 May 2018 06:05:52 -0700

Hi,

increasing the time to detect a dead task manager usually increases theamount of elements that need to be reprocessed in case of a failure.Once a dead task manager is identified, the entire application is rolledback to the latest successful checkpointed/consistent state of theapplication. So it is desirable to keep this time low in order to keepthe time to catch up low. Faul tolerance guarantees should not be affected.


I hope this helps.

Regards,
Timo

Am 15.05.18 um 01:42 schrieb Bajaj, Abhinav:

Hi,
We are running into issues where GC pause will result intoTaskmanagers being marked dead incorrectly.
Flink documentation<https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#distributed-coordination-via-akka>documents some knobs of Akka configurations to play around.
Focusing on /“akka.watch.heartbeat.pause”,/ it mentions /“Higher valueincreases the time to detect a dead TaskManager”/
Can someone please help me understand the downside of increasing thetime to detect a dead taskmanager?
Will this affect the fault tolerance guarantees / state management/checkpointing?
Thanks,

Abhinav

Re: Akka heartbeat configurations

Reply via email to