Nevermind, Looking at the logs I saw that it was having issues trying to connect to ZK. To make I short is had the wrong port.
It is now starting. Tomorrow I’ll try to kill some JobManagers *evil*. Another question : if I have multiple HA flink jobs, are there some points to check in order to be sure that they won’t collide on hdfs or ZK ? B.R. Gwenhaël PASQUIERS From: Till Rohrmann [mailto:till.rohrm...@gmail.com] Sent: mercredi 18 novembre 2015 18:01 To: user@flink.apache.org Subject: Re: YARN High Availability Hi Gwenhaël, do you have access to the yarn logs? Cheers, Till On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote: Hello, We’re trying to set up high availability using an existing zookeeper quorum already running in our Cloudera cluster. So, as per the doc we’ve changed the max attempt in yarn’s config as well as the flink.yaml. recovery.mode: zookeeper recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181 state.backend: filesystem state.backend.fs.checkpointdir: hdfs:///flink/checkpoints recovery.zookeeper.storageDir: hdfs:///flink/recovery/ yarn.application-attempts: 1000 Everything is ok as long as recovery.mode is commented. As soon as I uncomment recovery.mode the deployment on yarn is stuck on : “Deploying cluster, current state ACCEPTED”. “Deployment took more than 60 seconds….” Every second. And I have more than enough resources available on my yarn cluster. Do you have any idea of what could cause this, and/or what logs I should look for in order to understand ? B.R. Gwenhaël PASQUIERS