On first thought, the sessions and the multi-job vs. job queue question are almost two separate issues.
Can you add the sessions without removing the concurrent jobs we currently have? On Wed, May 13, 2015 at 10:34 AM, Maximilian Michels <m...@apache.org> wrote: > I think we can agree that real multi-user support in Flink (standalone) is > neither desirable, because there are already sophisticated solutions out > there (YARN or Mesos), nor feasible because it is a lot of work to get it > right. > > At the current state of affairs, resource sharing between two users > submitting a job at the same time, is not properly handled. However, this > discussion showed that it is desirable to have support for submitting > multiple job to a single Flink cluster. This could be realized using a > simple queuing system in which jobs are executed one after another. > > In case of the soon to be supported resuming of jobs from intermediate > results, this should still enable multiple clients to refer to past jobs. > The job manager simply holds a list of old ExecutionGraphs for each user > session. When the user ends the session or a timeout occurs, the > corresponding graph is archived. This poses some sort of session > management. > > tl;dr I propose to drop the multi-user support that we have now. Instead, > let's have a one-job-at-a-time usage model with a queuing system and > eventually a session management to deal with resuming from already > materialized results. > > What do you think? > > On Thu, Apr 30, 2015 at 11:09 AM, Flavio Pompermaier <pomperma...@okkam.it > > > wrote: > > > There was an attempt to build such a queue during the Dopa project when > > Flink was still Stratosphere. > > Probably it could be a good idea to collect the good and bad things > learned > > from it to start designing the new scheduler :) > > > > On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen <se...@apache.org> wrote: > > > > > Most components are written multi-job aware. > > > > > > The only thing that is not in there right now is scheduling policies > for > > > fair resource sharing. This is important in shared clusters. > > > > > > Since YARN implements all those things (various job queues with > different > > > priorities/policies etc), I suggest to not try and re-build it in Flink > > and > > > simply declare a JobManager a "single-job-at-a-time" manager. You can > > still > > > run an interactive session with many jobs one after another. > > > > > > > > > On Wed, Apr 29, 2015 at 7:07 PM, Maximilian Michels <m...@apache.org> > > > wrote: > > > > > > > > > > > > > However, dropping it completely instead of improving it would make > > > Flink > > > > > setups on dedicated clusters quite useless, right? > > > > > > > > > > > > > Not really, because you could also use YARN on dedicated clusters for > > > > proper multi-user support. > > > > > > > > On Wed, Apr 29, 2015 at 5:51 PM, Fabian Hueske <fhue...@gmail.com> > > > wrote: > > > > > > > > > I agree that Flink's multi-user support is not very good at the > > moment. > > > > > However, dropping it completely instead of improving it would make > > > Flink > > > > > setups on dedicated clusters quite useless, right? > > > > > > > > > > > > > > > 2015-04-29 17:33 GMT+02:00 Maximilian Michels <m...@apache.org>: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > Currently Flink accepts jobs from multiple clients and executes > > them > > > > > > concurrently if the resource limits are not exceeded. However, > the > > > > > > multi-user support is very poor. We don't support queuing of jobs > > and > > > > > > concurrent jobs have to share resources in a nice way. Otherwise, > > > jobs > > > > > will > > > > > > fail. > > > > > > > > > > > > Using YARN, we circumvent these problems because it provides a > > proper > > > > > user > > > > > > and session management. I'm wondering now, should we get rid of > the > > > > > pseudo > > > > > > multi-user mode and just support one user per Flink cluster > > instance? > > > > > > > > > > > > Best, > > > > > > Max > > > > > > > > > > > > PS: > > > > > > This question came up when I was working on a pull request to > > support > > > > > > backtracking intermediate results. I need to hold a copy of the > > full > > > > > > previous execution graph to resume from old results. With > multiple > > > > users, > > > > > > we have to build in some kind of session management to archive > old > > > > > > execution graphs. Otherwise, they will consume too much memory in > > the > > > > job > > > > > > manager. > > > > > > > > > > > > > > > > > > > > >