Re: Spreading Tasks across TaskManagers

Thomas Weise Fri, 12 Oct 2018 16:41:57 -0700

Hi Till,

Thanks for the pointer, glad that this is being worked on.


It almost looks like the non deterministic distribution behavior started
with 1.5.x (?) and that surprised us.

https://issues.apache.org/jira/browse/BEAM-5713

I agree that there is no one strategy that fits every use case. If an
application is limited by a resource per machine that the scheduler does
not understand (like let's say CPU or disk I/O), then it would be nice to
have a way to hint that round-robin distribution is desired (or achieve the
same through anti-affinity or resource constraints).

Thanks,
Thomas



On Fri, Oct 12, 2018 at 2:06 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Max,
>
> the community is currently working on Flink's scheduler component [1]. One
> of the things we want to enable in the future is bulk scheduling. With
> this, it should also be possible to add strategies how to distribute tasks
> across multiple TMs (spreading vs. co-locating).
>
> In general, I'm not 100% sure whether spreading out tasks is always the
> best strategy. Especially if you have a network heavy job co-locating tasks
> on the same TM could have benefits over spreading the tasks out.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10429
>
> Cheers,
> Till
>
> On Thu, Oct 11, 2018 at 8:16 PM Maximilian Michels <m...@apache.org> wrote:
>
> > Hi everyone,
> >
> > I've recently come across a cluster scheduling problem users are facing.
> > Clusters where TaskManagers have more slots than the parallelism
> > (#tm_slots > job_parallelism), tend to schedule all job tasks on a
> > single TaskManager.
> >
> > This is not good for spreading load and has been discussed in FLINK-1003
> > [1] and the other duplicate JIRA issues.
> >
> > I know that this is not really an issue if the cluster is created
> > exclusively for the Job, or if the number of slots per Taskmanager is
> > smaller than the parallelism. However, this seems like a rather easy
> > improvement to the Scheduler which would have a huge impact on
> performance.
> >
> > On the JIRA issue page it has been mentioned that this was put on hold
> > to work on dynamic scaling first.
> >
> > Now that the basic building blocks for dynamic scaling are in place, do
> > you think it would be possible to tackle FLINK-1003?
> >
> > Thanks,
> > Max
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-1003
> >
>

Re: Spreading Tasks across TaskManagers

Reply via email to