Re: HA on AWS EMR

2020-10-27 Thread Averell
Hello Robert, Thanks for the info. That makes sense. I will save and cancel my jobs with 1.10, upgrade to 1.11, and restore the jobs from the savepoints. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: HA on AWS EMR

2020-10-27 Thread Robert Metzger
Hey Averell, to clarify: You should be able to migrate using a savepoint from 1.10 to 1.11. Restoring from the state stored in Zookeeper (for HA) with a newer Flink version won't work. On Mon, Oct 26, 2020 at 5:05 PM Robert Metzger wrote: > Hey Averell, > > you should be able to migrate savepoi

Re: HA on AWS EMR

2020-10-26 Thread Robert Metzger
Hey Averell, you should be able to migrate savepoints from Flink 1.10 to 1.11. Is there a simple way for me to reproduce this issue locally? This seems to be a rare, but probably valid issue. Are you using any special operators? (like the new source API?) Best, Robert On Wed, Oct 21, 2020 at 11

Re: HA on AWS EMR

2020-10-21 Thread Averell
Hello Roman, Thanks for the answer. I have already had that high-availability.storageDir configured to an S3 location. Our service is not critical enough, so to save the cost, we are using the single-master EMR setup. I understand that we'll not get YARN HA in that case, but what I expect here is

Re: HA on AWS EMR

2020-10-20 Thread Khachatryan Roman
Hello Averell, I don't think ZK data is stored on a master node. And Flink JM data is stored usually on DFS - according to "high-availability.storageDir" [1] In either case, for Flink to be HA, Yarn should also be HA. And I think this is not the case with a single master node. Please consider mu

Re: HA on AWS EMR

2020-10-19 Thread Averell
Hello Roman, Thanks for your time. I'm using EMR 5.30.1 (Flink 1.10.0) with 1 master node. /yarn.application-attempts/ is not set (does that means unlimited?), while /yarn.resourcemanager.am.max-attempts/ is 4. In saying "EMR cluster crashed) I meant the cluster is lost. Some scenarios which cou

Re: HA on AWS EMR

2020-10-19 Thread Khachatryan Roman
Hi, Can you explain what "EMR cluster crashed" means in the 2nd scenario? Can you also share: - yarn.application-attempts in Flink - yarn.resourcemanager.am.max-attempts in Yarn - number of EMR master nodes (1 or 3) - EMR version? Regards, Roman On Mon, Oct 19, 2020 at 8:22 AM Averell wrote:

HA on AWS EMR

2020-10-18 Thread Averell
Hi, I'm trying to enable HA for my Flink jobs running on AWS EMR. Following [1], I created a common Flink YARN session and submitting all my jobs to that one. These 4 config params were added /high-availability = zookeeper high-availability.storageDir = high-availability.zookepper.pa