Hi Gwenhaël, do you have access to the yarn logs?
Cheers, Till On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers < gwenhael.pasqui...@ericsson.com> wrote: > Hello, > > > > We’re trying to set up high availability using an existing zookeeper > quorum already running in our Cloudera cluster. > > > > So, as per the doc we’ve changed the max attempt in yarn’s config as well > as the flink.yaml. > > > > recovery.mode: zookeeper > > recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181 > > state.backend: filesystem > > state.backend.fs.checkpointdir: hdfs:///flink/checkpoints > > recovery.zookeeper.storageDir: hdfs:///flink/recovery/ > > yarn.application-attempts: 1000 > > > > Everything is ok as long as recovery.mode is commented. > > As soon as I uncomment recovery.mode the deployment on yarn is stuck on : > > > > “Deploying cluster, current state ACCEPTED”. > > “Deployment took more than 60 seconds….” > > Every second. > > > > And I have more than enough resources available on my yarn cluster. > > > > Do you have any idea of what could cause this, and/or what logs I should > look for in order to understand ? > > > > B.R. > > > > Gwenhaël PASQUIERS >