Hello Averell,

I don't think ZK data is stored on a master node. And Flink JM data is
stored usually on DFS -  according to "high-availability.storageDir" [1]

In either case, for Flink to be HA, Yarn should also be HA. And I think
this is not the case with a single master node. Please consider
multi-master EMR setup [2].

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#high-availability-storagedir
[2] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-ha.html

Regards,
Roman


On Tue, Oct 20, 2020 at 12:13 AM Averell <lvhu...@gmail.com> wrote:

> Hello Roman,
>
> Thanks for your time.
> I'm using EMR 5.30.1 (Flink 1.10.0) with 1 master node.
> /yarn.application-attempts/ is not set (does that means unlimited?), while
> /yarn.resourcemanager.am.max-attempts/ is 4.
>
> In saying "EMR cluster crashed) I meant the cluster is lost. Some scenarios
> which could lead to this are:
>   - The master node is down
>   - The cluster is accidentally / deliberately terminated.
>
> I found a thread in our mailing list [1], in which Fabian mentioned a
> /"pointer"/ stored in Zookeeper. It looks like this piece of information is
> stored in Zookeeper's dataDir, which is by default stored in the local
> storage of the EMR's master node. I'm trying to move this one to an EFS, in
> hope that it would help. Not sure whether this is a right approach.
>
> Thanks for your help.
> Regards,
> Averell
>
>
> [1]
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/HA-and-zookeeper-tp27093p27119.html
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to