Hello Averell, I don't think ZK data is stored on a master node. And Flink JM data is stored usually on DFS - according to "high-availability.storageDir" [1]
In either case, for Flink to be HA, Yarn should also be HA. And I think this is not the case with a single master node. Please consider multi-master EMR setup [2]. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#high-availability-storagedir [2] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-ha.html Regards, Roman On Tue, Oct 20, 2020 at 12:13 AM Averell <lvhu...@gmail.com> wrote: > Hello Roman, > > Thanks for your time. > I'm using EMR 5.30.1 (Flink 1.10.0) with 1 master node. > /yarn.application-attempts/ is not set (does that means unlimited?), while > /yarn.resourcemanager.am.max-attempts/ is 4. > > In saying "EMR cluster crashed) I meant the cluster is lost. Some scenarios > which could lead to this are: > - The master node is down > - The cluster is accidentally / deliberately terminated. > > I found a thread in our mailing list [1], in which Fabian mentioned a > /"pointer"/ stored in Zookeeper. It looks like this piece of information is > stored in Zookeeper's dataDir, which is by default stored in the local > storage of the EMR's master node. I'm trying to move this one to an EFS, in > hope that it would help. Not sure whether this is a right approach. > > Thanks for your help. > Regards, > Averell > > > [1] > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/HA-and-zookeeper-tp27093p27119.html > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >