Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Stephan Ewen Fri, 05 Aug 2016 07:32:21 -0700

@Aljoscha

I would not make the ResourceManager a subcomponent of the JobManager.
While that may be simpler initially, I would like to keep the door open to
let RM and JM run in different processes/nodes.


Also, for Yarn/Mesos sessions, the ResourceManager may run longer than the
JobManager.

On Sun, Jul 31, 2016 at 6:58 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> +1
>
> I don't have much to say since this already seems very well worked out.
> Just some small remarks:
>  - This sentence that describes TaskManager behavior will probably have to
> be adapted for FLIP-1, correct? "Loss of connection to the JobManager
> results in triggering master-failure recovery (currently: cancel all tasks
> form that master)"
>  - For docker mode there is this sentence: "To start a Flink job, one
> configures a service to start one container of the Job/JobManager image,
> and N containers of the TaskManager image." This can be achieved with
> Docker compose. We already use this in the docker image that we have in the
> Flink source.
>  - The design mentions that the ResourceManager should be long running,
> especially longer than JobManager lifetime. However, this is only true for
> standalone mode and not for Yarn or Mesos which I think will be the two
> more important deployment modes. In those two modes it becomes basically a
> sub-component of the JobManager. Should this be made more prominent in the
> description of the ResourceManager?
>
> Cheers,
> Aljoscha
>
> On Fri, 29 Jul 2016 at 15:26 Wright, Eron <ewri...@live.com> wrote:
>
> > The design looks great - it solves for very diverse deployment modes,
> > allows for heterogeneous TMs, and promotes job isolation.
> >
> > Some feedback:
> >
> > *Dispatcher*
> > The dispatcher concept here expands nicely on what was introduced in the
> > Mesos design doc (MESOS-1984).  The most significant difference being the
> > job-centric orientation of the dispatcher API.   FLIP-6 seems to
> eliminate
> > the concept of a session (or, defines it simply as the lifecycle of a
> JM);
> > is that correct?    Do you agree I should revise the Mesos dispatcher
> > design to be job-centric?
> >
> > I'll be taking the first crack at implementing the dispatcher (for Mesos
> > only) in MESOS-1984 (T2).   I’ll keep FLIP-6 in mind as I go.
> >
> > The dispatcher's backend behavior will vary significantly for Mesos vs
> > standalone vs others.   Assumedly a base class with concrete
> > implementations will be introduced.  To echo the FLIP-6 design as I
> > understand it:
> >
> > 1) Standalone
> >    a) The dispatcher process starts an RM, dispatcher frontend, and
> > "local" dispatcher backend at startup.
> >    b) Upon job submission, the local dispatcher backend creates an
> > in-process JM actor for the job.
> >    c) The JM allocates slots as normal.   The RM draws from its pool of
> > registered TM, which grows and shrinks due (only) to external events.
> >
> > 2) Mesos
> >    a) The dispatcher process starts a dispatcher frontend and "Mesos"
> > dispatcher backend at startup.
> >    b) Upon job submission, the Mesos dispatcher backend creates a Mesos
> > task (dubbed an "AppMaster") which contains a JM/RM for the job.
> >    c) The system otherwise functions as described in the Mesos design
> doc.
> >
> > *Client*
> > I'm concerned about the two code paths that the client uses to launch a
> > job (with-dispatcher vs without-dispatcher).   Maybe it could be unified
> by
> > saying that the client always calls the dispatcher, and that the
> dispatcher
> > is hostable in either the client or in a separate process.  The only
> > variance would be the client-to-dispatcher transport (local vs HTTP).
> >
> > *RM*
> > On the issue of RM statefulness, we can say that the RM does not persist
> > slot allocation (the ground truth is in the TM), but may persist other
> > information (related to cluster manager interaction).  For example, the
> > Mesos RM persists the assigned framework identifier and per-task planning
> > information (as is highly recommended by the Mesos development guide).
> >
> > On RM fencing, I was already wondering whether to add it to the Mesos RM,
> > so it is nice to see it being introduced more generally.   My rationale
> is,
> > the dispatcher cannot guarantee that only a single RM is running, because
> > orphaned tasks are possible in certain Mesos failure situations.
> >  Similarly, I’m unsure whether YARN provides a strong guarantee about the
> > AM.
> >
> > *User Code*
> > Having job code on the system classpath seems possible in only a subset
> of
> > cases.   The variability may be complex.   How important is this
> > optimization?
> >
> > *Security Implications*
> > It should be noted that the standalone embodiment doesn't offer isolation
> > between jobs.  The whole system will have a single security context (as
> it
> > does now).
> >
> > Meanwhile, the ‘high-trust’ nature of the dispatcher in other scenarios
> is
> > rightly emphasized.  The fact that user code shouldn't be run in the
> > dispatcher process (except in standalone) must be kept in mind.   The
> > design doc of FLINK-3929 (section C2) has more detail on that.
> >
> >
> > -Eron
> >
> >
> > > On Jul 28, 2016, at 2:22 AM, Maximilian Michels <m...@apache.org>
> wrote:
> > >
> > > Hi Stephan,
> > >
> > > Thanks for the nice wrap-up of ideas and discussions we had over the
> > > last months (not all on the mailing list though because we were just
> > > getting started with the FLIP process). The document is very
> > > comprehensive and explains the changes in great details, even up to
> > > the message passing level.
> > >
> > > What I really like about the FLIP is that we delegate multi-tenancy
> > > away from the JobManager to the resource management framework and the
> > > dispatchers. This will help to make the JobManager component cleaner
> > > and simpler. The prospect of having the user jars directly in the
> > > system classpath of the workers, instead of dealing with custom class
> > > loaders, is very nice.
> > >
> > > The model we have for acquiring and releasing resources wouldn't work
> > > particularly well with all the new deployment options, so +1 on a new
> > > task slot request/offer system and +1 for making the ResourceManager
> > > responsible for TaskManager registration and slot management. This is
> > > well aligned with the initial idea of the ResourceManager component.
> > >
> > > We definitely need good testing for these changes since the
> > > possibility of bugs increases with the additional number of messages
> > > introduced.
> > >
> > > The only thing that bugs me is whether we make the Standalone mode a
> > > bit less nice to use. The initial bootstrapping of the nodes via the
> > > local dispatchers and the subsequent registration of TaskManagers and
> > > allocation of slots could cause some delay. It's not a major concern
> > > though because it will take little time compared to the actual job run
> > > time (unless you run a tiny WordCount).
> > >
> > > Cheers,
> > > Max
> > >
> > >
> > >
> > >
> > > On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <se...@apache.org>
> wrote:
> > >> Hi all!
> > >>
> > >> Here comes a pretty big FLIP: "Improvements to the Flink Deployment
> and
> > >> Process Model", to better support Yarn, Mesos, Kubernetes, and
> whatever
> > >> else Google, Elon Musk, and all the other folks will think up next.
> > >>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65147077
> > >>
> > >> It is a pretty big FLIP where I took input and thoughts from many
> > people,
> > >> like Till, Max, Xiaowei (and his colleagues), Eron, and others.
> > >>
> > >> The core ideas revolve around
> > >>  - making the JobManager in its core a per-job component (handle multi
> > >> tenancey outside the JobManager)
> > >>  - making resource acquisition and release more dynamic
> > >>  - tying deployments more naturally to jobs where desirable
> > >>
> > >>
> > >> Let's get the discussion started...
> > >>
> > >> Greetings,
> > >> Stephan
> >
> >
>

Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Reply via email to