Hi Till, Thanks for the pointer, glad that this is being worked on.
It almost looks like the non deterministic distribution behavior started with 1.5.x (?) and that surprised us. https://issues.apache.org/jira/browse/BEAM-5713 I agree that there is no one strategy that fits every use case. If an application is limited by a resource per machine that the scheduler does not understand (like let's say CPU or disk I/O), then it would be nice to have a way to hint that round-robin distribution is desired (or achieve the same through anti-affinity or resource constraints). Thanks, Thomas On Fri, Oct 12, 2018 at 2:06 AM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Max, > > the community is currently working on Flink's scheduler component [1]. One > of the things we want to enable in the future is bulk scheduling. With > this, it should also be possible to add strategies how to distribute tasks > across multiple TMs (spreading vs. co-locating). > > In general, I'm not 100% sure whether spreading out tasks is always the > best strategy. Especially if you have a network heavy job co-locating tasks > on the same TM could have benefits over spreading the tasks out. > > [1] https://issues.apache.org/jira/browse/FLINK-10429 > > Cheers, > Till > > On Thu, Oct 11, 2018 at 8:16 PM Maximilian Michels <m...@apache.org> wrote: > > > Hi everyone, > > > > I've recently come across a cluster scheduling problem users are facing. > > Clusters where TaskManagers have more slots than the parallelism > > (#tm_slots > job_parallelism), tend to schedule all job tasks on a > > single TaskManager. > > > > This is not good for spreading load and has been discussed in FLINK-1003 > > [1] and the other duplicate JIRA issues. > > > > I know that this is not really an issue if the cluster is created > > exclusively for the Job, or if the number of slots per Taskmanager is > > smaller than the parallelism. However, this seems like a rather easy > > improvement to the Scheduler which would have a huge impact on > performance. > > > > On the JIRA issue page it has been mentioned that this was put on hold > > to work on dynamic scaling first. > > > > Now that the basic building blocks for dynamic scaling are in place, do > > you think it would be possible to tackle FLINK-1003? > > > > Thanks, > > Max > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-1003 > > >