I agree with Aljoscha. Many companies install Flink (and its config) in a central directory and users share that installation.
On Thu, Nov 19, 2015 at 10:45 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > I think we should find a way to randomize the paths where the HA stuff > stores data. If users don’t realize that they store data in the same paths > this could lead to problems. > > > On 19 Nov 2015, at 08:50, Till Rohrmann <trohrm...@apache.org> wrote: > > > > Hi Gwenhaël, > > > > good to hear that you could resolve the problem. > > > > When you run multiple HA flink jobs in the same cluster, then you don’t > have to adjust the configuration of Flink. It should work out of the box. > > > > However, if you run multiple HA Flink cluster, then you have to set for > each cluster a distinct ZooKeeper root path via the option > recovery.zookeeper.path.root in the Flink configuraiton. This is necessary > because otherwise all JobManagers (the ones of the different clusters) will > compete for a single leadership. Furthermore, all TaskManagers will only > see the one and only leader and connect to it. The reason is that the > TaskManagers will look up their leader at a ZNode below the ZooKeeper root > path. > > > > If you have other questions then don’t hesitate asking me. > > > > Cheers, > > Till > > > > > > On Wed, Nov 18, 2015 at 6:37 PM, Gwenhael Pasquiers < > gwenhael.pasqui...@ericsson.com> wrote: > > Nevermind, > > > > > > > > Looking at the logs I saw that it was having issues trying to connect to > ZK. > > > > To make I short is had the wrong port. > > > > > > > > It is now starting. > > > > > > > > Tomorrow I’ll try to kill some JobManagers *evil*. > > > > > > > > Another question : if I have multiple HA flink jobs, are there some > points to check in order to be sure that they won’t collide on hdfs or ZK ? > > > > > > > > B.R. > > > > > > > > Gwenhaël PASQUIERS > > > > > > > > From: Till Rohrmann [mailto:till.rohrm...@gmail.com] > > Sent: mercredi 18 novembre 2015 18:01 > > To: user@flink.apache.org > > Subject: Re: YARN High Availability > > > > > > > > Hi Gwenhaël, > > > > > > > > do you have access to the yarn logs? > > > > > > > > Cheers, > > > > Till > > > > > > > > On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers < > gwenhael.pasqui...@ericsson.com> wrote: > > > > Hello, > > > > > > > > We’re trying to set up high availability using an existing zookeeper > quorum already running in our Cloudera cluster. > > > > > > > > So, as per the doc we’ve changed the max attempt in yarn’s config as > well as the flink.yaml. > > > > > > > > recovery.mode: zookeeper > > > > recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181 > > > > state.backend: filesystem > > > > state.backend.fs.checkpointdir: hdfs:///flink/checkpoints > > > > recovery.zookeeper.storageDir: hdfs:///flink/recovery/ > > > > yarn.application-attempts: 1000 > > > > > > > > Everything is ok as long as recovery.mode is commented. > > > > As soon as I uncomment recovery.mode the deployment on yarn is stuck on : > > > > > > > > “Deploying cluster, current state ACCEPTED”. > > > > “Deployment took more than 60 seconds….” > > > > Every second. > > > > > > > > And I have more than enough resources available on my yarn cluster. > > > > > > > > Do you have any idea of what could cause this, and/or what logs I should > look for in order to understand ? > > > > > > > > B.R. > > > > > > > > Gwenhaël PASQUIERS > > > > > > > > > >