Hi Till, Thanks for the pointers, I started looking into this and it does not seem to be too complicated add a new strategy :)
I will try to put together a PR but I would greatly appreciate if you could check it out once I'm done as I don't have too much experience with these components. Cheers, Gyula Till Rohrmann <trohrm...@apache.org> ezt írta (időpont: 2016. jún. 17., P, 11:56): > Hi Gyula, > > the scheduler actually deploys independent tasks in a round-robin fashion > across the cluster. So for example, your source sub tasks should be spread > evenly. However, whenever a sub-task has an input, it tries to deploy this > task on the same machine as one of the input sub-tasks (preferred > locations). If you have an n-to-m communication scheme, then this means > that all downstream sub-tasks depend on the same set of of upstream > sub-tasks. The task manager of the first upstream sub-task which has some > free slots left is selected for the next down-stream sub-task. > > This is the point where the clustered deployment happens, because we simply > take the first TaskManager instead of checking that we evenly spread the > load across all TaskManagers which execute upstream sub-tasks. I think that > it should not be a big change to add a mode where we spread the load across > all TaskManager which are in the set of preferred locations. The changes > should go to the findInstance method in Scheduler.java:463. > > Do you want to take the lead for this feature? > > Cheers, > Till > > On Mon, Jun 13, 2016 at 3:11 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > Thanks! I found this PR already but seemed to be completely outdated :) > > > > Maybe it's worth restarting this discussion. > > > > Gyula > > > > Chesnay Schepler <ches...@apache.org> ezt írta (időpont: 2016. jún. 13., > > H, > > 14:58): > > > > > FLINK-1003 may be related. > > > > > > On 13.06.2016 12:46, Gyula Fóra wrote: > > > > Hey, > > > > > > > > The Flink scheduling mechanism has become quite a bit of a pain > lately > > > for > > > > us when trying to schedule IO heavy streaming jobs. And by IO heavy I > > > mean > > > > it has a fairly large state that is being continuously updated/read. > > > > > > > > The main problem is that the scheduled task slots are not evenly > > > > distributed among the different task managers but usually the first > TM > > > > takes as much slots as possibles and the other TMs get much fewer. > And > > > > since the job is RocksDB IO bound the uneven load causes a > significant > > > > performance penalty. > > > > > > > > This is further accentuated during historical runs when we are trying > > to > > > > "fast-forward" the application. The difference can be quite > substantial > > > in > > > > a 3-4 node cluster: with even task distribution the history might > run 3 > > > > times faster compared to an uneven one. > > > > > > > > I was wondering if there was a simple way to modify the scheduler so > it > > > > allocates resources in a round-robin fashion. Probably someone has a > > lot > > > of > > > > experience with this already :) (I'm running 1.0.3 for this job btw) > > > > > > > > Cheers, > > > > Gyula > > > > > > > > > > > > >