Nevermind,

Looking at the logs I saw that it was having issues trying to connect to ZK.
To make I short is had the wrong port.

It is now starting.

Tomorrow I’ll try to kill some JobManagers *evil*.

Another question : if I have multiple HA flink jobs, are there some points to 
check in order to be sure that they won’t collide on hdfs or ZK ?

B.R.

Gwenhaël PASQUIERS

From: Till Rohrmann [mailto:till.rohrm...@gmail.com]
Sent: mercredi 18 novembre 2015 18:01
To: user@flink.apache.org
Subject: Re: YARN High Availability

Hi Gwenhaël,

do you have access to the yarn logs?

Cheers,
Till

On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers 
<gwenhael.pasqui...@ericsson.com<mailto:gwenhael.pasqui...@ericsson.com>> wrote:
Hello,

We’re trying to set up high availability using an existing zookeeper quorum 
already running in our Cloudera cluster.

So, as per the doc we’ve changed the max attempt in yarn’s config as well as 
the flink.yaml.

recovery.mode: zookeeper
recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181
state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///flink/checkpoints
recovery.zookeeper.storageDir: hdfs:///flink/recovery/
yarn.application-attempts: 1000

Everything is ok as long as recovery.mode is commented.
As soon as I uncomment recovery.mode the deployment on yarn is stuck on :

“Deploying cluster, current state ACCEPTED”.
“Deployment took more than 60 seconds….”
Every second.

And I have more than enough resources available on my yarn cluster.

Do you have any idea of what could cause this, and/or what logs I should look 
for in order to understand ?

B.R.

Gwenhaël PASQUIERS

Reply via email to