Re: YARN High Availability

Ufuk Celebi Sat, 02 Apr 2016 03:45:07 -0700

Hey Konstantin,

That's weird. Can you please log the client output on DEBUG level and
provide that as well? I'm wondering whether the client uses a
different root path.


The following seems to happen:
- you use ledf_recovery as the root namespace
- the task managers are connecting (they resolve the JM address via
ZooKeeper in this case as well, which means they correctly use the
same namespace)
- but the client, which started the YARN session, does not ever submit
the job to the cluster.

– Ufuk

On Thu, Mar 31, 2016 at 9:23 AM, Konstantin Knauf
<konstantin.kn...@tngtech.com> wrote:
> Hi everyone,
>
> we are running in some problems with multiple per-job yarn sessions, too.
>
> When we are are starting a per-job yarn session (Flink 1.0, Hadoop 2.4)
> with recovery.zookeeper.path.root other than /flink, the yarn session
> starts but no job is submitted, and after 1 min or so the session
> crashes. I attached the jobmanager log.
>
> In Zookeeper the root-directory is created and child-nodes
>
> leaderlatch
> jobgraphs
>
> /flink does also exist, but does not have child nodes.
>
> Everything runs fine, with the default recovery.zookeeper.root.path.
>
> Does anyone have an idea, what is going on?
>
> Cheers,
>
> Konstnatin
>
>
> On 23.11.2015 17:00, Gwenhael Pasquiers wrote:
>> We are not yet using HA in our cluster instances.
>>
>> But yes, we will have to change the zookeeper.path.root J
>>
>>
>>
>> We package our jobs with their own config folder (we don’t rely on
>> flink’s config folder); we can put the maven project name into this
>> property then they will have different values J
>>
>>
>>
>>
>>
>> *From:*Till Rohrmann [mailto:trohrm...@apache.org]
>> *Sent:* lundi 23 novembre 2015 14:51
>> *To:* user@flink.apache.org
>> *Subject:* Re: YARN High Availability
>>
>>
>>
>> The problem is the execution graph handle which is stored in ZooKeeper.
>> You can manually remove it via the ZooKeeper shell by simply deleting
>> everything below your `recovery.zookeeper.path.root` ZNode. But you
>> should be sure that the cluster has been stopped before.
>>
>>
>>
>> Do you start the different clusters with different
>> `recovery.zookeeper.path.root` values? If not, then you should run into
>> troubles when running multiple clusters at the same time. The reason is
>> that then all clusters will think that they belong together.
>>
>>
>>
>> Cheers,
>>
>> Till
>>
>>
>>
>> On Mon, Nov 23, 2015 at 2:15 PM, Gwenhael Pasquiers
>> <gwenhael.pasqui...@ericsson.com
>> <mailto:gwenhael.pasqui...@ericsson.com>> wrote:
>>
>> OK, I understand.
>>
>> Maybe we are not really using flink as you intended. The way we are
>> using it, one cluster equals one job. That way we are sure to isolate
>> the different jobs as much as possible and in case of crashes / bugs /
>> (etc) can completely kill one cluster without interfering with the other
>> jobs.
>>
>> That future behavior seems good :-)
>>
>> Instead of the manual flink commands, is there to manually delete those
>> old jobs before launching my job ? They probably are somewhere in hdfs,
>> aren't they ?
>>
>> B.R.
>>
>>
>>
>> -----Original Message-----
>> From: Ufuk Celebi [mailto:u...@apache.org <mailto:u...@apache.org>]
>> Sent: lundi 23 novembre 2015 12:12
>> To: user@flink.apache.org <mailto:user@flink.apache.org>
>> Subject: Re: YARN High Availability
>>
>> Hey Gwenhaël,
>>
>> the restarting jobs are most likely old job submissions. They are not
>> cleaned up when you shut down the cluster, but only when they finish
>> (either regular finish or after cancelling).
>>
>> The workaround is to use the command line frontend:
>>
>> bin/flink cancel JOBID
>>
>> for each RESTARTING job. Sorry about the inconvenience!
>>
>> We are in an active discussion about addressing this. The future
>> behaviour will be that the startup or shutdown of a cluster cleans up
>> everything and an option to skip this step.
>>
>> The reasoning for the initial solution (not removing anything) was to
>> make sure that no jobs are deleted by accident. But it looks like this
>> is more confusing than helpful.
>>
>> – Ufuk
>>
>>> On 23 Nov 2015, at 11:45, Gwenhael Pasquiers
>> <gwenhael.pasqui...@ericsson.com
>> <mailto:gwenhael.pasqui...@ericsson.com>> wrote:
>>>
>>> Hi again !
>>>
>>> On the same topic I'm still trying to start my streaming job with HA.
>>> The HA part seems to be more or less OK (I killed the JobManager and
>> it came back), however I have an issue with the TaskManagers.
>>> I configured my job to have only one TaskManager and 1 slot that does
>> [source=>map=>sink].
>>> The issue I'm encountering is that other instances of my job appear
>> and are in the RESTARTING status since there is only one task slot.
>>>
>>> Do you know of this, or have an idea of where to look in order to
>> understand what's happening ?
>>>
>>> B.R.
>>>
>>> Gwenhaël PASQUIERS
>>>
>>> -----Original Message-----
>>> From: Maximilian Michels [mailto:m...@apache.org <mailto:m...@apache.org>]
>>> Sent: jeudi 19 novembre 2015 13:36
>>> To: user@flink.apache.org <mailto:user@flink.apache.org>
>>> Subject: Re: YARN High Availability
>>>
>>> The docs have been updated.
>>>
>>> On Thu, Nov 19, 2015 at 12:36 PM, Ufuk Celebi <u...@apache.org
>> <mailto:u...@apache.org>> wrote:
>>>> I’ve added a note about this to the docs and asked Max to trigger a
>> new build of them.
>>>>
>>>> Regarding Aljoscha’s idea: I like it. It is essentially a shortcut
>> for configuring the root path.
>>>>
>>>> In any case, it is orthogonal to Till’s proposals. That one we need
>> to address as well (see FLINK-2929). The motivation for the current
>> behaviour was to be rather defensive when removing state in order to not
>> loose data accidentally. But it can be confusing, indeed.
>>>>
>>>> – Ufuk
>>>>
>>>>> On 19 Nov 2015, at 12:08, Till Rohrmann <trohrm...@apache.org
>> <mailto:trohrm...@apache.org>> wrote:
>>>>>
>>>>> You mean an additional start-up parameter for the `start-cluster.sh`
>> script for the HA case? That could work.
>>>>>
>>>>> On Thu, Nov 19, 2015 at 11:54 AM, Aljoscha Krettek
>> <aljos...@apache.org <mailto:aljos...@apache.org>> wrote:
>>>>> Maybe we could add a user parameter to specify a cluster name that
>> is used to make the paths unique.
>>>>>
>>>>>
>>>>> On Thu, Nov 19, 2015, 11:24 Till Rohrmann <trohrm...@apache.org
>> <mailto:trohrm...@apache.org>> wrote:
>>>>> I agree that this would make the configuration easier. However, it
>> entails also that the user has to retrieve the randomized path from the
>> logs if he wants to restart jobs after the cluster has crashed or
>> intentionally restarted. Furthermore, the system won't be able to clean
>> up old checkpoint and job handles in case that the cluster stop was
>> intentional.
>>>>>
>>>>> Thus, the question is how do we define the behaviour in order to
>> retrieve handles and to clean up old handles so that ZooKeeper won't be
>> cluttered with old handles?
>>>>>
>>>>> There are basically two modes:
>>>>>
>>>>> 1. Keep state handles when shutting down the cluster. Provide a mean
>> to define a fixed path when starting the cluster and also a mean to
>> purge old state handles. Furthermore, add a shutdown mode where the
>> handles under the current path are directly removed. This mode would
>> guarantee to always have the state handles available if not explicitly
>> told differently. However, the downside is that ZooKeeper will be
>> cluttered most certainly.
>>>>>
>>>>> 2. Remove the state handles when shutting down the cluster. Provide
>> a shutdown mode where we keep the state handles. This will keep
>> ZooKeeper clean but will give you also the possibility to keep a
>> checkpoint around if necessary. However, the user is more likely to lose
>> his state when shutting down the cluster.
>>>>>
>>>>> On Thu, Nov 19, 2015 at 10:55 AM, Robert Metzger
>> <rmetz...@apache.org <mailto:rmetz...@apache.org>> wrote:
>>>>> I agree with Aljoscha. Many companies install Flink (and its config)
>> in a central directory and users share that installation.
>>>>>
>>>>> On Thu, Nov 19, 2015 at 10:45 AM, Aljoscha Krettek
>> <aljos...@apache.org <mailto:aljos...@apache.org>> wrote:
>>>>> I think we should find a way to randomize the paths where the HA
>> stuff stores data. If users don’t realize that they store data in the
>> same paths this could lead to problems.
>>>>>
>>>>>> On 19 Nov 2015, at 08:50, Till Rohrmann <trohrm...@apache.org
>> <mailto:trohrm...@apache.org>> wrote:
>>>>>>
>>>>>> Hi Gwenhaël,
>>>>>>
>>>>>> good to hear that you could resolve the problem.
>>>>>>
>>>>>> When you run multiple HA flink jobs in the same cluster, then you
>> don’t have to adjust the configuration of Flink. It should work out of
>> the box.
>>>>>>
>>>>>> However, if you run multiple HA Flink cluster, then you have to set
>> for each cluster a distinct ZooKeeper root path via the option
>> recovery.zookeeper.path.root in the Flink configuraiton. This is
>> necessary because otherwise all JobManagers (the ones of the different
>> clusters) will compete for a single leadership. Furthermore, all
>> TaskManagers will only see the one and only leader and connect to it.
>> The reason is that the TaskManagers will look up their leader at a ZNode
>> below the ZooKeeper root path.
>>>>>>
>>>>>> If you have other questions then don’t hesitate asking me.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 18, 2015 at 6:37 PM, Gwenhael Pasquiers
>> <gwenhael.pasqui...@ericsson.com
>> <mailto:gwenhael.pasqui...@ericsson.com>> wrote:
>>>>>> Nevermind,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Looking at the logs I saw that it was having issues trying to
>> connect to ZK.
>>>>>>
>>>>>> To make I short is had the wrong port.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It is now starting.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Tomorrow I’ll try to kill some JobManagers *evil*.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another question : if I have multiple HA flink jobs, are there some
>> points to check in order to be sure that they won’t collide on hdfs or ZK ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> B.R.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Gwenhaël PASQUIERS
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Till Rohrmann [mailto:till.rohrm...@gmail.com
>> <mailto:till.rohrm...@gmail.com>]
>>>>>> Sent: mercredi 18 novembre 2015 18:01
>>>>>> To: user@flink.apache.org <mailto:user@flink.apache.org>
>>>>>> Subject: Re: YARN High Availability
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Gwenhaël,
>>>>>>
>>>>>>
>>>>>>
>>>>>> do you have access to the yarn logs?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Till
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers
>> <gwenhael.pasqui...@ericsson.com
>> <mailto:gwenhael.pasqui...@ericsson.com>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We’re trying to set up high availability using an existing
>> zookeeper quorum already running in our Cloudera cluster.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, as per the doc we’ve changed the max attempt in yarn’s config
>> as well as the flink.yaml.
>>>>>>
>>>>>>
>>>>>>
>>>>>> recovery.mode: zookeeper
>>>>>>
>>>>>> recovery.zookeeper.quorum: host1:3181,host2:3181,host3:3181
>>>>>>
>>>>>> state.backend: filesystem
>>>>>>
>>>>>> state.backend.fs.checkpointdir: hdfs:///flink/checkpoints
>>>>>>
>>>>>> recovery.zookeeper.storageDir: hdfs:///flink/recovery/
>>>>>>
>>>>>> yarn.application-attempts: 1000
>>>>>>
>>>>>>
>>>>>>
>>>>>> Everything is ok as long as recovery.mode is commented.
>>>>>>
>>>>>> As soon as I uncomment recovery.mode the deployment on yarn is
>> stuck on :
>>>>>>
>>>>>>
>>>>>>
>>>>>> “Deploying cluster, current state ACCEPTED”.
>>>>>>
>>>>>> “Deployment took more than 60 seconds….”
>>>>>>
>>>>>> Every second.
>>>>>>
>>>>>>
>>>>>>
>>>>>> And I have more than enough resources available on my yarn cluster.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Do you have any idea of what could cause this, and/or what logs I
>> should look for in order to understand ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> B.R.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Gwenhaël PASQUIERS
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>> <unwanted_jobs.jpg>
>>
>>
>>
>
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082

Re: YARN High Availability

Reply via email to