I also have jobs failing on a daily basis with the error "Heartbeat of TaskManager with id <id> timed out". I'm using Flink 1.5.2.
Could anyone suggest how to debug possible causes? I already set these in flink-conf.yaml, but I'm still getting failures: heartbeat.interval: 10000 heartbeat.timeout: 100000 Thanks. On Sun, Jul 22, 2018 at 2:20 PM Vishal Santoshi <vishal.santo...@gmail.com> wrote: > According to the UI it seems that " > > org.apache.flink.util.FlinkException: The assigned slot > 208af709ef7be2d2dfc028ba3bbf4600_10 was removed. > > " was the cause of a pipe restart. > > As to the TM it is an artifact of the new job allocation regime which will > exhaust all slots on a TM rather then distributing them equitably. TMs > selectively are under more stress then in a pure RR distribution I think. > We may have to lower the slots on each TM to define a good upper bound. You > are correct 50s is a a pretty generous value. > > On Sun, Jul 22, 2018 at 6:55 AM, Gary Yao <g...@data-artisans.com> wrote: > >> Hi, >> >> The first exception should be only logged on info level. It's expected to >> see >> this exception when a TaskManager unregisters from the ResourceManager. >> >> Heartbeats can be configured via heartbeat.interval and hearbeat.timeout >> [1]. >> The default timeout is 50s, which should be a generous value. It is >> probably a >> good idea to find out why the heartbeats cannot be answered by the TM. >> >> Best, >> Gary >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#heartbeat-manager >> >> >> On Sun, Jul 22, 2018 at 1:36 AM, Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> 2 issues we are seeing on 1.5.1 on a streaming pipe line >>> >>> org.apache.flink.util.FlinkException: The assigned slot >>> 208af709ef7be2d2dfc028ba3bbf4600_10 was removed. >>> >>> >>> and >>> >>> java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id >>> 208af709ef7be2d2dfc028ba3bbf4600 timed out. >>> >>> >>> Not sure about the first but how do we increase the heartbeat interval >>> of a TM >>> >>> Thanks much >>> >>> Vishal >>> >> >> >