Hi Averell,

That log file does not look complete. I do not see any INFO level log
messages
such as [1].

Best,
Gary

[1]
https://github.com/apache/flink/blob/46326ab9181acec53d1e9e7ec8f4a26c672fec31/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java#L544

On Fri, Feb 1, 2019 at 12:18 AM Averell <lvhu...@gmail.com> wrote:

> Hi Gary,
>
> I faced a similar problem yesterday, but don't know what was the cause yet.
> The situation that I observed is as follow:
>  - At about 2:57, one of my EMR execution node (IP ...99) got disconnected
> from YARN resource manager (on RM I could not see that node anymore),
> despite that the node was still running. <<< This is another issue, but I
> believe it is with YARN.
>  - About 8 hours after that (between 10:00 - 11:00), I turned the
> problematic EMR core node off. AWS spun up another node and added it to the
> cluster to replace that. YARN RM soon recognized the new node and added it
> to its list of available nodes.
> However, the JM seemed to not (able to) do anything after that. It kept
> trying to start the job, failed after the timeout and that "no resource
> available" exception again and again. No jobmanager logs recorded since
> 2:57:15 though.
>
> I am attaching the logs collected via "yarn logs --applicationId <appId>
> here. But it seems I still missed something.
>
> I am using Flink 1.7.1, with yarn-site configuration
> yarn.resourcemanager.am.max-attempts=5. Flink configurations are all of the
> default values.
>
> Thanks and best regards,
> Averell flink.log
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/flink.log>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to