Hi Till,

Thanks for the pointers, I started looking into this and it does not seem
to be too complicated add a new strategy :)

I will try to put together a PR but I would greatly appreciate if you could
check it out once I'm done as I don't have too much experience with these
components.

Cheers,
Gyula

Till Rohrmann <trohrm...@apache.org> ezt írta (időpont: 2016. jún. 17., P,
11:56):

> Hi Gyula,
>
> the scheduler actually deploys independent tasks in a round-robin fashion
> across the cluster. So for example, your source sub tasks should be spread
> evenly. However, whenever a sub-task has an input, it tries to deploy this
> task on the same machine as one of the input sub-tasks (preferred
> locations). If you have an n-to-m communication scheme, then this means
> that all downstream sub-tasks depend on the same set of of upstream
> sub-tasks. The task manager of the first upstream sub-task which has some
> free slots left is selected for the next down-stream sub-task.
>
> This is the point where the clustered deployment happens, because we simply
> take the first TaskManager instead of checking that we evenly spread the
> load across all TaskManagers which execute upstream sub-tasks. I think that
> it should not be a big change to add a mode where we spread the load across
> all TaskManager which are in the set of preferred locations. The changes
> should go to the findInstance method in Scheduler.java:463.
>
> Do you want to take the lead for this feature?
>
> Cheers,
> Till
>
> On Mon, Jun 13, 2016 at 3:11 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>
> > Thanks! I found this PR already but seemed to be completely outdated :)
> >
> > Maybe it's worth restarting this discussion.
> >
> > Gyula
> >
> > Chesnay Schepler <ches...@apache.org> ezt írta (időpont: 2016. jún. 13.,
> > H,
> > 14:58):
> >
> > > FLINK-1003 may be related.
> > >
> > > On 13.06.2016 12:46, Gyula Fóra wrote:
> > > > Hey,
> > > >
> > > > The Flink scheduling mechanism has become quite a bit of a pain
> lately
> > > for
> > > > us when trying to schedule IO heavy streaming jobs. And by IO heavy I
> > > mean
> > > > it has a fairly large state that is being continuously updated/read.
> > > >
> > > > The main problem is that the scheduled task slots are not evenly
> > > > distributed among the different task managers but usually the first
> TM
> > > > takes as much slots as possibles and the other TMs get much fewer.
> And
> > > > since the job is RocksDB IO bound the uneven load causes a
> significant
> > > > performance penalty.
> > > >
> > > > This is further accentuated during historical runs when we are trying
> > to
> > > > "fast-forward" the application. The difference can be quite
> substantial
> > > in
> > > > a 3-4 node cluster: with even task distribution the history might
> run 3
> > > > times faster compared to an uneven one.
> > > >
> > > > I was wondering if there was a simple way to modify the scheduler so
> it
> > > > allocates resources in a round-robin fashion. Probably someone has a
> > lot
> > > of
> > > > experience with this already :) (I'm running 1.0.3 for this job btw)
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > >
> > >
> >
>

Reply via email to