Re: Flink app cannot restart

2020-07-24 Thread Robert Metzger
Hi Rainie, I believe we need the full JobManager log to understand what's going on with your job. The logs you've provided so far only tell us that a TaskManager has died (which is expected, when a node goes down). What is interesting to see is what's happening next: are we having enough resources

Re: Flink app cannot restart

2020-07-23 Thread Rainie Li
Thank you Yang, I checked "yarn.application-attempts" is already set to 10. Here is the exception part from job manager log. Full log file is too big, I also reflected it to remove some company specific info. Any suggestion to this exception would be appreciated! 2020-07-15 20:04:52,265 INFO org.

Re: Flink app cannot restart

2020-07-22 Thread Yang Wang
Could you check for that whether the JobManager is also running on the lost Yarn NodeManager? If it is the case, you need to configure "yarn.application-attempts" to a value bigger than 1. BTW, the logs you provided are not Yarn NodeManager logs. And if you could provide the full jobmanager log,

Flink app cannot restart

2020-07-22 Thread Rainie Li
Hi Flink help, I am new to Flink. I am investigating one flink app that cannot restart when we lose yarn node manager (tc.yarn.rm.cluster.NumActiveNMs=0), while other flink apps can restart automatically. *Here is job's restartPolicy setting:* *env.setRestartStrategy(RestartStrategies.fixedDelay