YARN High Availability

Gwenhael Pasquiers Wed, 18 Nov 2015 08:57:08 -0800

Hello,

We're trying to set up high availability using an existing zookeeper quorum 
already running in our Cloudera cluster.


So, as per the doc we've changed the max attempt in yarn's config as well as 
the flink.yaml.

recovery.mode: zookeeper
recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181
state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///flink/checkpoints
recovery.zookeeper.storageDir: hdfs:///flink/recovery/
yarn.application-attempts: 1000

Everything is ok as long as recovery.mode is commented.
As soon as I uncomment recovery.mode the deployment on yarn is stuck on :

"Deploying cluster, current state ACCEPTED".
"Deployment took more than 60 seconds...."
Every second.

And I have more than enough resources available on my yarn cluster.

Do you have any idea of what could cause this, and/or what logs I should look 
for in order to understand ?

B.R.

Gwenhaël PASQUIERS

YARN High Availability

Reply via email to