I think we can agree that real multi-user support in Flink (standalone) is neither desirable, because there are already sophisticated solutions out there (YARN or Mesos), nor feasible because it is a lot of work to get it right.
At the current state of affairs, resource sharing between two users submitting a job at the same time, is not properly handled. However, this discussion showed that it is desirable to have support for submitting multiple job to a single Flink cluster. This could be realized using a simple queuing system in which jobs are executed one after another. In case of the soon to be supported resuming of jobs from intermediate results, this should still enable multiple clients to refer to past jobs. The job manager simply holds a list of old ExecutionGraphs for each user session. When the user ends the session or a timeout occurs, the corresponding graph is archived. This poses some sort of session management. tl;dr I propose to drop the multi-user support that we have now. Instead, let's have a one-job-at-a-time usage model with a queuing system and eventually a session management to deal with resuming from already materialized results. What do you think? On Thu, Apr 30, 2015 at 11:09 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > There was an attempt to build such a queue during the Dopa project when > Flink was still Stratosphere. > Probably it could be a good idea to collect the good and bad things learned > from it to start designing the new scheduler :) > > On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen <se...@apache.org> wrote: > > > Most components are written multi-job aware. > > > > The only thing that is not in there right now is scheduling policies for > > fair resource sharing. This is important in shared clusters. > > > > Since YARN implements all those things (various job queues with different > > priorities/policies etc), I suggest to not try and re-build it in Flink > and > > simply declare a JobManager a "single-job-at-a-time" manager. You can > still > > run an interactive session with many jobs one after another. > > > > > > On Wed, Apr 29, 2015 at 7:07 PM, Maximilian Michels <m...@apache.org> > > wrote: > > > > > > > > > > However, dropping it completely instead of improving it would make > > Flink > > > > setups on dedicated clusters quite useless, right? > > > > > > > > > > Not really, because you could also use YARN on dedicated clusters for > > > proper multi-user support. > > > > > > On Wed, Apr 29, 2015 at 5:51 PM, Fabian Hueske <fhue...@gmail.com> > > wrote: > > > > > > > I agree that Flink's multi-user support is not very good at the > moment. > > > > However, dropping it completely instead of improving it would make > > Flink > > > > setups on dedicated clusters quite useless, right? > > > > > > > > > > > > 2015-04-29 17:33 GMT+02:00 Maximilian Michels <m...@apache.org>: > > > > > > > > > Hi everyone, > > > > > > > > > > Currently Flink accepts jobs from multiple clients and executes > them > > > > > concurrently if the resource limits are not exceeded. However, the > > > > > multi-user support is very poor. We don't support queuing of jobs > and > > > > > concurrent jobs have to share resources in a nice way. Otherwise, > > jobs > > > > will > > > > > fail. > > > > > > > > > > Using YARN, we circumvent these problems because it provides a > proper > > > > user > > > > > and session management. I'm wondering now, should we get rid of the > > > > pseudo > > > > > multi-user mode and just support one user per Flink cluster > instance? > > > > > > > > > > Best, > > > > > Max > > > > > > > > > > PS: > > > > > This question came up when I was working on a pull request to > support > > > > > backtracking intermediate results. I need to hold a copy of the > full > > > > > previous execution graph to resume from old results. With multiple > > > users, > > > > > we have to build in some kind of session management to archive old > > > > > execution graphs. Otherwise, they will consume too much memory in > the > > > job > > > > > manager. > > > > > > > > > > > > > > >