Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Stephan Ewen Fri, 05 Aug 2016 07:30:19 -0700

Hi Eron!

Some comments on your comments:


*Dispatcher*
  - The dispatcher should NOT be job-centric. The dispatcher should take
over the "multi job" responsibilities here, now that the JobManager is
single-job only.
  - An abstract dispatcher would be great. It could implement the
connection/HTTP elements and have an abstract method to start a job
    -> Yarn - use YarnClusterClient to start a YarnJob
    -> Mesos - same thing
    -> Standalone - spawn a JobManager

*Client*
This is an interesting point. Max is currently refactoring the clients into
   - Cluster Client (with specialization for Yarn, Standalone) to launch
jobs and control a cluster (yarn session, ...)
   - Job Client, which is connected to a single job and can issue commands
to that job (cancel/stop/checkpoint/savepoint/change-parallelism)

Let's try and get his input on this.


*RM*
Agreed - the base RM is "stateless", specialized RMs can behave different,
if they need to.
RM fencing must be generic - all cluster types can suffer from orphaned
tasks (Yarn as well, I think)


*User Code*
I think in the cases where processes/containers are launched per-job, this
should always be feasible. It is a nice optimization that I think we should
do where ever possible. Makes users' life with respect to classloading much
easier.
Some cases with custom class loading are currently tough in Flink - that
way, these jobs would at least run in the yarn/mesos individual job mode
(not the session mode still, that one needs dynamic class loading).

*Standalone Security*
That is a known limitation and fine for now, I think. Whoever wants proper
security needs to go to Yarn/Mesos initially. Standalone v2.0 may change
that.

Greetings,
Stephan



On Sat, Jul 30, 2016 at 12:26 AM, Wright, Eron <[email protected]> wrote:

> The design looks great - it solves for very diverse deployment modes,
> allows for heterogeneous TMs, and promotes job isolation.
>
> Some feedback:
>
> *Dispatcher*
> The dispatcher concept here expands nicely on what was introduced in the
> Mesos design doc (MESOS-1984).  The most significant difference being the
> job-centric orientation of the dispatcher API.   FLIP-6 seems to eliminate
> the concept of a session (or, defines it simply as the lifecycle of a JM);
> is that correct?    Do you agree I should revise the Mesos dispatcher
> design to be job-centric?
>
> I'll be taking the first crack at implementing the dispatcher (for Mesos
> only) in MESOS-1984 (T2).   I’ll keep FLIP-6 in mind as I go.
>
> The dispatcher's backend behavior will vary significantly for Mesos vs
> standalone vs others.   Assumedly a base class with concrete
> implementations will be introduced.  To echo the FLIP-6 design as I
> understand it:
>
> 1) Standalone
>    a) The dispatcher process starts an RM, dispatcher frontend, and
> "local" dispatcher backend at startup.
>    b) Upon job submission, the local dispatcher backend creates an
> in-process JM actor for the job.
>    c) The JM allocates slots as normal.   The RM draws from its pool of
> registered TM, which grows and shrinks due (only) to external events.
>
> 2) Mesos
>    a) The dispatcher process starts a dispatcher frontend and "Mesos"
> dispatcher backend at startup.
>    b) Upon job submission, the Mesos dispatcher backend creates a Mesos
> task (dubbed an "AppMaster") which contains a JM/RM for the job.
>    c) The system otherwise functions as described in the Mesos design doc.
>
> *Client*
> I'm concerned about the two code paths that the client uses to launch a
> job (with-dispatcher vs without-dispatcher).   Maybe it could be unified by
> saying that the client always calls the dispatcher, and that the dispatcher
> is hostable in either the client or in a separate process.  The only
> variance would be the client-to-dispatcher transport (local vs HTTP).
>
> *RM*
> On the issue of RM statefulness, we can say that the RM does not persist
> slot allocation (the ground truth is in the TM), but may persist other
> information (related to cluster manager interaction).  For example, the
> Mesos RM persists the assigned framework identifier and per-task planning
> information (as is highly recommended by the Mesos development guide).
>
> On RM fencing, I was already wondering whether to add it to the Mesos RM,
> so it is nice to see it being introduced more generally.   My rationale is,
> the dispatcher cannot guarantee that only a single RM is running, because
> orphaned tasks are possible in certain Mesos failure situations.
>  Similarly, I’m unsure whether YARN provides a strong guarantee about the
> AM.
>
> *User Code*
> Having job code on the system classpath seems possible in only a subset of
> cases.   The variability may be complex.   How important is this
> optimization?
>
> *Security Implications*
> It should be noted that the standalone embodiment doesn't offer isolation
> between jobs.  The whole system will have a single security context (as it
> does now).
>
> Meanwhile, the ‘high-trust’ nature of the dispatcher in other scenarios is
> rightly emphasized.  The fact that user code shouldn't be run in the
> dispatcher process (except in standalone) must be kept in mind.   The
> design doc of FLINK-3929 (section C2) has more detail on that.
>
>
> -Eron
>
>
> > On Jul 28, 2016, at 2:22 AM, Maximilian Michels <[email protected]> wrote:
> >
> > Hi Stephan,
> >
> > Thanks for the nice wrap-up of ideas and discussions we had over the
> > last months (not all on the mailing list though because we were just
> > getting started with the FLIP process). The document is very
> > comprehensive and explains the changes in great details, even up to
> > the message passing level.
> >
> > What I really like about the FLIP is that we delegate multi-tenancy
> > away from the JobManager to the resource management framework and the
> > dispatchers. This will help to make the JobManager component cleaner
> > and simpler. The prospect of having the user jars directly in the
> > system classpath of the workers, instead of dealing with custom class
> > loaders, is very nice.
> >
> > The model we have for acquiring and releasing resources wouldn't work
> > particularly well with all the new deployment options, so +1 on a new
> > task slot request/offer system and +1 for making the ResourceManager
> > responsible for TaskManager registration and slot management. This is
> > well aligned with the initial idea of the ResourceManager component.
> >
> > We definitely need good testing for these changes since the
> > possibility of bugs increases with the additional number of messages
> > introduced.
> >
> > The only thing that bugs me is whether we make the Standalone mode a
> > bit less nice to use. The initial bootstrapping of the nodes via the
> > local dispatchers and the subsequent registration of TaskManagers and
> > allocation of slots could cause some delay. It's not a major concern
> > though because it will take little time compared to the actual job run
> > time (unless you run a tiny WordCount).
> >
> > Cheers,
> > Max
> >
> >
> >
> >
> > On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <[email protected]> wrote:
> >> Hi all!
> >>
> >> Here comes a pretty big FLIP: "Improvements to the Flink Deployment and
> >> Process Model", to better support Yarn, Mesos, Kubernetes, and whatever
> >> else Google, Elon Musk, and all the other folks will think up next.
> >>
> >> https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65147077
> >>
> >> It is a pretty big FLIP where I took input and thoughts from many
> people,
> >> like Till, Max, Xiaowei (and his colleagues), Eron, and others.
> >>
> >> The core ideas revolve around
> >>  - making the JobManager in its core a per-job component (handle multi
> >> tenancey outside the JobManager)
> >>  - making resource acquisition and release more dynamic
> >>  - tying deployments more naturally to jobs where desirable
> >>
> >>
> >> Let's get the discussion started...
> >>
> >> Greetings,
> >> Stephan
>
>

Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Reply via email to