Hi folks,

I would like to do some load balancing within one of my Flink jobs to achieve 
good scalability. The rebalance() method is not applicable in my case, as the 
runtime is dominated by the processing of very few larger elements in my 
dataset. Hence, I need to distribute the processing work for these elements 
among the nodes in the cluster. To do so, I subdivide those elements into 
partial tasks and want to distribute these partial tasks to other nodes by 
employing a custom partitioner.

Now, my question is the following: Actually, I do not need to shuffle the 
complete dataset but only a few elements. So is there a way of telling within 
the partitioner, that data should reside on the same task manager? Thanks!

Cheers,
Sebastian

Reply via email to