There was an attempt to build such a queue during the Dopa project when Flink was still Stratosphere. Probably it could be a good idea to collect the good and bad things learned from it to start designing the new scheduler :)
On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen <se...@apache.org> wrote: > Most components are written multi-job aware. > > The only thing that is not in there right now is scheduling policies for > fair resource sharing. This is important in shared clusters. > > Since YARN implements all those things (various job queues with different > priorities/policies etc), I suggest to not try and re-build it in Flink and > simply declare a JobManager a "single-job-at-a-time" manager. You can still > run an interactive session with many jobs one after another. > > > On Wed, Apr 29, 2015 at 7:07 PM, Maximilian Michels <m...@apache.org> > wrote: > > > > > > > However, dropping it completely instead of improving it would make > Flink > > > setups on dedicated clusters quite useless, right? > > > > > > > Not really, because you could also use YARN on dedicated clusters for > > proper multi-user support. > > > > On Wed, Apr 29, 2015 at 5:51 PM, Fabian Hueske <fhue...@gmail.com> > wrote: > > > > > I agree that Flink's multi-user support is not very good at the moment. > > > However, dropping it completely instead of improving it would make > Flink > > > setups on dedicated clusters quite useless, right? > > > > > > > > > 2015-04-29 17:33 GMT+02:00 Maximilian Michels <m...@apache.org>: > > > > > > > Hi everyone, > > > > > > > > Currently Flink accepts jobs from multiple clients and executes them > > > > concurrently if the resource limits are not exceeded. However, the > > > > multi-user support is very poor. We don't support queuing of jobs and > > > > concurrent jobs have to share resources in a nice way. Otherwise, > jobs > > > will > > > > fail. > > > > > > > > Using YARN, we circumvent these problems because it provides a proper > > > user > > > > and session management. I'm wondering now, should we get rid of the > > > pseudo > > > > multi-user mode and just support one user per Flink cluster instance? > > > > > > > > Best, > > > > Max > > > > > > > > PS: > > > > This question came up when I was working on a pull request to support > > > > backtracking intermediate results. I need to hold a copy of the full > > > > previous execution graph to resume from old results. With multiple > > users, > > > > we have to build in some kind of session management to archive old > > > > execution graphs. Otherwise, they will consume too much memory in the > > job > > > > manager. > > > > > > > > > >