We have been working on an adaptive load balancing strategy that would address exactly the issue you point out. FLINK-1725 is the starting point for the integration.
Cheers, -- Gianmarco On 9 June 2015 at 20:31, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Sebastian, > > I agree, shuffling only specific elements would be a very useful feature, > but unfortunately it's not supported (yet). > Would you like to open a JIRA for that? > > Cheers, Fabian > > 2015-06-09 17:22 GMT+02:00 Kruse, Sebastian <sebastian.kr...@hpi.de>: > >> Hi folks, >> >> >> >> I would like to do some load balancing within one of my Flink jobs to >> achieve good scalability. The rebalance() method is not applicable in my >> case, as the runtime is dominated by the processing of very few larger >> elements in my dataset. Hence, I need to distribute the processing work for >> these elements among the nodes in the cluster. To do so, I subdivide those >> elements into partial tasks and want to distribute these partial tasks to >> other nodes by employing a custom partitioner. >> >> >> >> Now, my question is the following: Actually, I do not need to shuffle the >> complete dataset but only a few elements. So is there a way of telling >> within the partitioner, that data should reside on the same task manager? >> Thanks! >> >> >> >> Cheers, >> >> Sebastian >> > >