Hi Averell, That log file does not look complete. I do not see any INFO level log messages such as [1].
Best, Gary [1] https://github.com/apache/flink/blob/46326ab9181acec53d1e9e7ec8f4a26c672fec31/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java#L544 On Fri, Feb 1, 2019 at 12:18 AM Averell <lvhu...@gmail.com> wrote: > Hi Gary, > > I faced a similar problem yesterday, but don't know what was the cause yet. > The situation that I observed is as follow: > - At about 2:57, one of my EMR execution node (IP ...99) got disconnected > from YARN resource manager (on RM I could not see that node anymore), > despite that the node was still running. <<< This is another issue, but I > believe it is with YARN. > - About 8 hours after that (between 10:00 - 11:00), I turned the > problematic EMR core node off. AWS spun up another node and added it to the > cluster to replace that. YARN RM soon recognized the new node and added it > to its list of available nodes. > However, the JM seemed to not (able to) do anything after that. It kept > trying to start the job, failed after the timeout and that "no resource > available" exception again and again. No jobmanager logs recorded since > 2:57:15 though. > > I am attaching the logs collected via "yarn logs --applicationId <appId> > here. But it seems I still missed something. > > I am using Flink 1.7.1, with yarn-site configuration > yarn.resourcemanager.am.max-attempts=5. Flink configurations are all of the > default values. > > Thanks and best regards, > Averell flink.log > < > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/flink.log> > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >