@Aljoscha I would not make the ResourceManager a subcomponent of the JobManager. While that may be simpler initially, I would like to keep the door open to let RM and JM run in different processes/nodes.
Also, for Yarn/Mesos sessions, the ResourceManager may run longer than the JobManager. On Sun, Jul 31, 2016 at 6:58 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > +1 > > I don't have much to say since this already seems very well worked out. > Just some small remarks: > - This sentence that describes TaskManager behavior will probably have to > be adapted for FLIP-1, correct? "Loss of connection to the JobManager > results in triggering master-failure recovery (currently: cancel all tasks > form that master)" > - For docker mode there is this sentence: "To start a Flink job, one > configures a service to start one container of the Job/JobManager image, > and N containers of the TaskManager image." This can be achieved with > Docker compose. We already use this in the docker image that we have in the > Flink source. > - The design mentions that the ResourceManager should be long running, > especially longer than JobManager lifetime. However, this is only true for > standalone mode and not for Yarn or Mesos which I think will be the two > more important deployment modes. In those two modes it becomes basically a > sub-component of the JobManager. Should this be made more prominent in the > description of the ResourceManager? > > Cheers, > Aljoscha > > On Fri, 29 Jul 2016 at 15:26 Wright, Eron <ewri...@live.com> wrote: > > > The design looks great - it solves for very diverse deployment modes, > > allows for heterogeneous TMs, and promotes job isolation. > > > > Some feedback: > > > > *Dispatcher* > > The dispatcher concept here expands nicely on what was introduced in the > > Mesos design doc (MESOS-1984). The most significant difference being the > > job-centric orientation of the dispatcher API. FLIP-6 seems to > eliminate > > the concept of a session (or, defines it simply as the lifecycle of a > JM); > > is that correct? Do you agree I should revise the Mesos dispatcher > > design to be job-centric? > > > > I'll be taking the first crack at implementing the dispatcher (for Mesos > > only) in MESOS-1984 (T2). I’ll keep FLIP-6 in mind as I go. > > > > The dispatcher's backend behavior will vary significantly for Mesos vs > > standalone vs others. Assumedly a base class with concrete > > implementations will be introduced. To echo the FLIP-6 design as I > > understand it: > > > > 1) Standalone > > a) The dispatcher process starts an RM, dispatcher frontend, and > > "local" dispatcher backend at startup. > > b) Upon job submission, the local dispatcher backend creates an > > in-process JM actor for the job. > > c) The JM allocates slots as normal. The RM draws from its pool of > > registered TM, which grows and shrinks due (only) to external events. > > > > 2) Mesos > > a) The dispatcher process starts a dispatcher frontend and "Mesos" > > dispatcher backend at startup. > > b) Upon job submission, the Mesos dispatcher backend creates a Mesos > > task (dubbed an "AppMaster") which contains a JM/RM for the job. > > c) The system otherwise functions as described in the Mesos design > doc. > > > > *Client* > > I'm concerned about the two code paths that the client uses to launch a > > job (with-dispatcher vs without-dispatcher). Maybe it could be unified > by > > saying that the client always calls the dispatcher, and that the > dispatcher > > is hostable in either the client or in a separate process. The only > > variance would be the client-to-dispatcher transport (local vs HTTP). > > > > *RM* > > On the issue of RM statefulness, we can say that the RM does not persist > > slot allocation (the ground truth is in the TM), but may persist other > > information (related to cluster manager interaction). For example, the > > Mesos RM persists the assigned framework identifier and per-task planning > > information (as is highly recommended by the Mesos development guide). > > > > On RM fencing, I was already wondering whether to add it to the Mesos RM, > > so it is nice to see it being introduced more generally. My rationale > is, > > the dispatcher cannot guarantee that only a single RM is running, because > > orphaned tasks are possible in certain Mesos failure situations. > > Similarly, I’m unsure whether YARN provides a strong guarantee about the > > AM. > > > > *User Code* > > Having job code on the system classpath seems possible in only a subset > of > > cases. The variability may be complex. How important is this > > optimization? > > > > *Security Implications* > > It should be noted that the standalone embodiment doesn't offer isolation > > between jobs. The whole system will have a single security context (as > it > > does now). > > > > Meanwhile, the ‘high-trust’ nature of the dispatcher in other scenarios > is > > rightly emphasized. The fact that user code shouldn't be run in the > > dispatcher process (except in standalone) must be kept in mind. The > > design doc of FLINK-3929 (section C2) has more detail on that. > > > > > > -Eron > > > > > > > On Jul 28, 2016, at 2:22 AM, Maximilian Michels <m...@apache.org> > wrote: > > > > > > Hi Stephan, > > > > > > Thanks for the nice wrap-up of ideas and discussions we had over the > > > last months (not all on the mailing list though because we were just > > > getting started with the FLIP process). The document is very > > > comprehensive and explains the changes in great details, even up to > > > the message passing level. > > > > > > What I really like about the FLIP is that we delegate multi-tenancy > > > away from the JobManager to the resource management framework and the > > > dispatchers. This will help to make the JobManager component cleaner > > > and simpler. The prospect of having the user jars directly in the > > > system classpath of the workers, instead of dealing with custom class > > > loaders, is very nice. > > > > > > The model we have for acquiring and releasing resources wouldn't work > > > particularly well with all the new deployment options, so +1 on a new > > > task slot request/offer system and +1 for making the ResourceManager > > > responsible for TaskManager registration and slot management. This is > > > well aligned with the initial idea of the ResourceManager component. > > > > > > We definitely need good testing for these changes since the > > > possibility of bugs increases with the additional number of messages > > > introduced. > > > > > > The only thing that bugs me is whether we make the Standalone mode a > > > bit less nice to use. The initial bootstrapping of the nodes via the > > > local dispatchers and the subsequent registration of TaskManagers and > > > allocation of slots could cause some delay. It's not a major concern > > > though because it will take little time compared to the actual job run > > > time (unless you run a tiny WordCount). > > > > > > Cheers, > > > Max > > > > > > > > > > > > > > > On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <se...@apache.org> > wrote: > > >> Hi all! > > >> > > >> Here comes a pretty big FLIP: "Improvements to the Flink Deployment > and > > >> Process Model", to better support Yarn, Mesos, Kubernetes, and > whatever > > >> else Google, Elon Musk, and all the other folks will think up next. > > >> > > >> > > https://cwiki.apache.org/confluence/pages/viewpage. > action?pageId=65147077 > > >> > > >> It is a pretty big FLIP where I took input and thoughts from many > > people, > > >> like Till, Max, Xiaowei (and his colleagues), Eron, and others. > > >> > > >> The core ideas revolve around > > >> - making the JobManager in its core a per-job component (handle multi > > >> tenancey outside the JobManager) > > >> - making resource acquisition and release more dynamic > > >> - tying deployments more naturally to jobs where desirable > > >> > > >> > > >> Let's get the discussion started... > > >> > > >> Greetings, > > >> Stephan > > > > >