Re: [DISCUSS] FLIP-6 Problems

Renjie Liu Tue, 05 Jun 2018 17:13:24 -0700

Hi, Stephan:

Yes that's what I mean. In fact the most import thing is to share the
dispatcher so that we can have *a centralized gateway for flink job
management and submission. The problem with per job cluster is that we
can't have a centralized gateway.*


I didn't realize that job manager also needs to run user code before and
yes that means we job manager should also be isolated.

Wouldn't it be better to separate job manager from the dispatcher so that
user code does't interfere with each other? In fact it seems that in most
production environments job isolation is required since nobody want their
job to be affected by others.

On Tue, Jun 5, 2018 at 11:34 PM Stephan Ewen <[email protected]> wrote:

> Hi Renjie,
>
> When you suggest to have TaskManager isolation in session mode, do you mean
> to have a shared JobManager / Dispatcher, but job-specific TaskManagers?
> If this mainly to reduce the overhead of the per-job JobManager?
>
> One assumption so far was that if TaskManager isolation is required,
> JobManager isolation is also required, because some user code potentially
> also runs on the JobManager, like CheckpointHooks, Input/Output Formats,
> ...
>
> Best,
> Stephan
>
>
>
> On Tue, Jun 5, 2018 at 4:20 PM, Renjie Liu <[email protected]>
> wrote:
>
> > Hi, Till:
> >
> >
> >    1. Does the community has any plan to add task manager isolation into
> >    the session mode?
> >    2. Is there any issues to track this feature? I want to help
> contribute.
> >    3. Thanks for the knowledge but it can't help if task manager
> isolation
> >    is not present.
> >
> >
> > On Tue, Jun 5, 2018 at 7:28 PM Till Rohrmann <[email protected]>
> wrote:
> >
> > > Hi Renjie,
> > >
> > > 1) you're right that the Flink session mode does not give you proper
> job
> > > isolation. It is the same as with Flink 1.4 session mode. If this is a
> > > strong requirement for you, then I recommend using the per job mode.
> > >
> > > 2) At the moment it is also not possible to define per job resource
> > > requirements when using the session mode. This is a feature which the
> > > community has started implementing but it is not yet fully done. I
> assume
> > > that the community will continue working on it. At the moment, the
> > solution
> > > would be to use the per job mode to not waste unnecessary resources.
> > >
> > > 3) I think the assigned ResourceID for a TaskManager is shown in the
> web
> > UI
> > > and when querying the "/taskmanagers" REST endpoint. The resource id is
> > > derived from the Mesos task id. Would that help to identify which TM is
> > > running on which Mesos task?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jun 5, 2018 at 5:13 AM Renjie Liu <[email protected]>
> > wrote:
> > >
> > > > ---------- Forwarded message ---------
> > > > From: Renjie Liu <[email protected]>
> > > > Date: Tue, Jun 5, 2018 at 10:43 AM
> > > > Subject: [DISCUSS] FLIP-6 Problems
> > > > To: user <[email protected]>
> > > >
> > > >
> > > > Hi:
> > > >
> > > > We've deployed flink 1.5.0 and tested the new cluster manager, it's
> > > really
> > > > great for flink to be elastic. However we've also found some problems
> > > that
> > > > blocks us from deploying it to production environment.
> > > >
> > > > 1. Task manager isolation. Currently flink allows different jobs to
> > > execute
> > > > on same task managers, this is unacceptable in production environment
> > > since
> > > > a faulty written job may kill task managers and affect other jobs.
> > > > 2. Per job resource configuration. Currently flink session cluster
> can
> > > only
> > > > allocate same size and configuration task managers. This may waste a
> > lot
> > > of
> > > > resources if we have a lot of jobs with different resource
> requirement.
> > > > 3. Task manager's name is meanless.  This is a problem since we can't
> > > > monitor status of container in mesos environment.
> > > >
> > > > One solution to the above problems is to use per job cluster, but a
> > > > centralized cluster manager can help to manage flink deployment and
> > jobs
> > > > better.
> > > >
> > > > How you guys think about those? If the community agrees with us, we
> > would
> > > > like to propose design and implementation to enhance the flink
> cluster
> > > > manager.
> > > > --
> > > > Liu, Renjie
> > > > Software Engineer, MVAD
> > > > --
> > > > Liu, Renjie
> > > > Software Engineer, MVAD
> > > >
> > >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
>
-- 
Liu, Renjie
Software Engineer, MVAD

Re: [DISCUSS] FLIP-6 Problems

Reply via email to