That's really great! I'll help to contribute to the process. On Wed, Jun 6, 2018 at 3:17 PM Till Rohrmann <trohrm...@apache.org> wrote:
> Hi Renjie, > > there is already an issue for introducing further scheduling constraints > (e.g. tags) to achieve TM isolation when using the session mode [1]. What > it does not cover is the isolation of the JMs which need to be executed in > their own processes. At the moment they share the same process with the > Dispatcher because it was simpler to do it like that as first iteration. > Here is the issue for isolating JobManagers [2]. > > Concerning the resource specification, the corresponding issue can be found > here [3]. > > [1] https://issues.apache.org/jira/browse/FLINK-8886 > [2] https://issues.apache.org/jira/browse/FLINK-9537 > [3] https://issues.apache.org/jira/browse/FLINK-5131 > > Cheers, > Till > > On Wed, Jun 6, 2018 at 2:13 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > > > Hi, Stephan: > > > > Yes that's what I mean. In fact the most import thing is to share the > > dispatcher so that we can have *a centralized gateway for flink job > > management and submission. The problem with per job cluster is that we > > can't have a centralized gateway.* > > > > I didn't realize that job manager also needs to run user code before and > > yes that means we job manager should also be isolated. > > > > Wouldn't it be better to separate job manager from the dispatcher so that > > user code does't interfere with each other? In fact it seems that in most > > production environments job isolation is required since nobody want their > > job to be affected by others. > > > > On Tue, Jun 5, 2018 at 11:34 PM Stephan Ewen <se...@apache.org> wrote: > > > > > Hi Renjie, > > > > > > When you suggest to have TaskManager isolation in session mode, do you > > mean > > > to have a shared JobManager / Dispatcher, but job-specific > TaskManagers? > > > If this mainly to reduce the overhead of the per-job JobManager? > > > > > > One assumption so far was that if TaskManager isolation is required, > > > JobManager isolation is also required, because some user code > potentially > > > also runs on the JobManager, like CheckpointHooks, Input/Output > Formats, > > > ... > > > > > > Best, > > > Stephan > > > > > > > > > > > > On Tue, Jun 5, 2018 at 4:20 PM, Renjie Liu <liurenjie2...@gmail.com> > > > wrote: > > > > > > > Hi, Till: > > > > > > > > > > > > 1. Does the community has any plan to add task manager isolation > > into > > > > the session mode? > > > > 2. Is there any issues to track this feature? I want to help > > > contribute. > > > > 3. Thanks for the knowledge but it can't help if task manager > > > isolation > > > > is not present. > > > > > > > > > > > > On Tue, Jun 5, 2018 at 7:28 PM Till Rohrmann <trohrm...@apache.org> > > > wrote: > > > > > > > > > Hi Renjie, > > > > > > > > > > 1) you're right that the Flink session mode does not give you > proper > > > job > > > > > isolation. It is the same as with Flink 1.4 session mode. If this > is > > a > > > > > strong requirement for you, then I recommend using the per job > mode. > > > > > > > > > > 2) At the moment it is also not possible to define per job resource > > > > > requirements when using the session mode. This is a feature which > the > > > > > community has started implementing but it is not yet fully done. I > > > assume > > > > > that the community will continue working on it. At the moment, the > > > > solution > > > > > would be to use the per job mode to not waste unnecessary > resources. > > > > > > > > > > 3) I think the assigned ResourceID for a TaskManager is shown in > the > > > web > > > > UI > > > > > and when querying the "/taskmanagers" REST endpoint. The resource > id > > is > > > > > derived from the Mesos task id. Would that help to identify which > TM > > is > > > > > running on which Mesos task? > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Tue, Jun 5, 2018 at 5:13 AM Renjie Liu <liurenjie2...@gmail.com > > > > > > wrote: > > > > > > > > > > > ---------- Forwarded message --------- > > > > > > From: Renjie Liu <liurenjie2...@gmail.com> > > > > > > Date: Tue, Jun 5, 2018 at 10:43 AM > > > > > > Subject: [DISCUSS] FLIP-6 Problems > > > > > > To: user <u...@flink.apache.org> > > > > > > > > > > > > > > > > > > Hi: > > > > > > > > > > > > We've deployed flink 1.5.0 and tested the new cluster manager, > it's > > > > > really > > > > > > great for flink to be elastic. However we've also found some > > problems > > > > > that > > > > > > blocks us from deploying it to production environment. > > > > > > > > > > > > 1. Task manager isolation. Currently flink allows different jobs > to > > > > > execute > > > > > > on same task managers, this is unacceptable in production > > environment > > > > > since > > > > > > a faulty written job may kill task managers and affect other > jobs. > > > > > > 2. Per job resource configuration. Currently flink session > cluster > > > can > > > > > only > > > > > > allocate same size and configuration task managers. This may > waste > > a > > > > lot > > > > > of > > > > > > resources if we have a lot of jobs with different resource > > > requirement. > > > > > > 3. Task manager's name is meanless. This is a problem since we > > can't > > > > > > monitor status of container in mesos environment. > > > > > > > > > > > > One solution to the above problems is to use per job cluster, > but a > > > > > > centralized cluster manager can help to manage flink deployment > and > > > > jobs > > > > > > better. > > > > > > > > > > > > How you guys think about those? If the community agrees with us, > we > > > > would > > > > > > like to propose design and implementation to enhance the flink > > > cluster > > > > > > manager. > > > > > > -- > > > > > > Liu, Renjie > > > > > > Software Engineer, MVAD > > > > > > -- > > > > > > Liu, Renjie > > > > > > Software Engineer, MVAD > > > > > > > > > > > > > > > -- > > > > Liu, Renjie > > > > Software Engineer, MVAD > > > > > > > > > -- > > Liu, Renjie > > Software Engineer, MVAD > > > -- Liu, Renjie Software Engineer, MVAD