If I'm not mistaken, then the scheduler has already a preference to spread independent pipelines out across the cluster. At least he uses a queue of instances from which it pops the first element if it allocates a new slot. This instance is then appended to the queue again, if it has some resources (slots) left.
I would assume that you have a shuffle operation involved in your job such that it makes sense for the scheduler to deploy all pipelines to the same machine. Cheers, Till On Dec 1, 2015 4:01 PM, "Stephan Ewen" <se...@apache.org> wrote: > Slots are like "resource groups" which execute entire pipelines. They > frequently have more than one operator. > > What you can try as a workaround is decrease the number of slots per > machine to cause the operators to be spread across more machines. > > If this is a crucial issue for your use case, it should be simple to add a > "preference to spread out" to the scheduler... > > On Tue, Dec 1, 2015 at 3:26 PM, Kashmar, Ali <ali.kash...@emc.com> wrote: > > > Is there a way to make a task cluster-parallelizable? I.e. Make sure the > > parallel instances of the task are distributed across the cluster. When I > > run my flink job with a parallelism of 16, all the parallel tasks are > > assigned to the first task manager. > > > > - Ali > > > > On 2015-11-30, 2:18 PM, "Ufuk Celebi" <u...@apache.org> wrote: > > > > > > > >> On 30 Nov 2015, at 17:47, Kashmar, Ali <ali.kash...@emc.com> wrote: > > >> Do the parallel instances of each task get distributed across the > > >>cluster or is it possible that they all run on the same node? > > > > > >Yes, slots are requested from all nodes of the cluster. But keep in mind > > >that multiple tasks (forming a local pipeline) can be scheduled to the > > >same slot (1 slot can hold many tasks). > > > > > >Have you seen this? > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-0.10/internals/job > > >_scheduling.html > > > > > >> If they can all run on the same node, what happens when that node > > >>crashes? Does the job manager recreate them using the remaining open > > >>slots? > > > > > >What happens: The job manager tries to restart the program with the same > > >parallelism. Thus if you have enough free slots available in your > > >cluster, this works smoothly (so yes, the remaining/available slots are > > >used) > > > > > >With a YARN cluster the task manager containers are restarted > > >automatically. In standalone mode, you have to take care of this > yourself. > > > > > > > > >Does this help? > > > > > > Ufuk > > > > > > > >