Re: Spreading Tasks across TaskManagers

2018-10-16 Thread Piotr Nowojski
Hi, > In general, I'm not 100% sure whether spreading out tasks is always the > best strategy. Especially if you have a network heavy job co-locating tasks > on the same TM could have benefits over spreading the tasks out. Definitely spreading out tasks is not always the best, but I would guess t

Re: Spreading Tasks across TaskManagers

2018-10-16 Thread Till Rohrmann
Yes, the ResourceSpec is not yet fully functional. The idea is to allow the user to specify how many resources an operator needs. Depending on these requirements, the RM should allocate slots which can fulfill these requirements. Cheers, Till On Tue, Oct 16, 2018 at 2:29 PM Maximilian Michels wr

Re: Spreading Tasks across TaskManagers

2018-10-16 Thread Maximilian Michels
the community is currently working on Flink's scheduler component [1] That sounds great! I agree that spreading tasks across the nodes is not always desirable but it would be nice to give users an option to provide hints to the scheduler. The location aware bulk scheduling you mentioned would b

Re: Spreading Tasks across TaskManagers

2018-10-12 Thread Thomas Weise
Hi Till, Thanks for the pointer, glad that this is being worked on. It almost looks like the non deterministic distribution behavior started with 1.5.x (?) and that surprised us. https://issues.apache.org/jira/browse/BEAM-5713 I agree that there is no one strategy that fits every use case. If a

Re: Spreading Tasks across TaskManagers

2018-10-12 Thread Till Rohrmann
Hi Max, the community is currently working on Flink's scheduler component [1]. One of the things we want to enable in the future is bulk scheduling. With this, it should also be possible to add strategies how to distribute tasks across multiple TMs (spreading vs. co-locating). In general, I'm not

Spreading Tasks across TaskManagers

2018-10-11 Thread Maximilian Michels
Hi everyone, I've recently come across a cluster scheduling problem users are facing. Clusters where TaskManagers have more slots than the parallelism (#tm_slots > job_parallelism), tend to schedule all job tasks on a single TaskManager. This is not good for spreading load and has been discu