Hi Rainie,
I believe we need the full JobManager log to understand what's going on
with your job. The logs you've provided so far only tell us that a
TaskManager has died (which is expected, when a node goes down). What is
interesting to see is what's happening next: are we having enough resources
Thank you Yang, I checked "yarn.application-attempts" is already set to 10.
Here is the exception part from job manager log. Full log file is too big,
I also reflected it to remove some company specific info.
Any suggestion to this exception would be appreciated!
2020-07-15 20:04:52,265 INFO
org.
Could you check for that whether the JobManager is also running on the lost
Yarn NodeManager?
If it is the case, you need to configure "yarn.application-attempts" to a
value bigger than 1.
BTW, the logs you provided are not Yarn NodeManager logs. And if you could
provide the full jobmanager
log,
Hi Flink help,
I am new to Flink.
I am investigating one flink app that cannot restart when we lose yarn node
manager (tc.yarn.rm.cluster.NumActiveNMs=0), while other flink apps can
restart automatically.
*Here is job's restartPolicy setting:*
*env.setRestartStrategy(RestartStrategies.fixedDelay