Hi Chenyu,
The tipically reasons for the heartbeat timeout includes:
1. Long GC time in TM / JM
2. Network instability
Thus does the GC log or network monitor metrics could give
some hints ?
Best,
Yun
--
Sender:Chenyu Zheng
Date:
JobManager timeout error:
2021-08-10 09:58:35,350 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Print
to Std. Out (79/128) (b498a5b17c87eb70c3da9aea93890e25) switched from DEPLOYING
to FAILED on stream-93072a8b402f49cca9c134a6e8b4887a-taskmanager-1-121 @
10.50.15