Hello, We're trying to set up high availability using an existing zookeeper quorum already running in our Cloudera cluster.
So, as per the doc we've changed the max attempt in yarn's config as well as the flink.yaml. recovery.mode: zookeeper recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181 state.backend: filesystem state.backend.fs.checkpointdir: hdfs:///flink/checkpoints recovery.zookeeper.storageDir: hdfs:///flink/recovery/ yarn.application-attempts: 1000 Everything is ok as long as recovery.mode is commented. As soon as I uncomment recovery.mode the deployment on yarn is stuck on : "Deploying cluster, current state ACCEPTED". "Deployment took more than 60 seconds...." Every second. And I have more than enough resources available on my yarn cluster. Do you have any idea of what could cause this, and/or what logs I should look for in order to understand ? B.R. Gwenhaƫl PASQUIERS