Hi folks, I would like to do some load balancing within one of my Flink jobs to achieve good scalability. The rebalance() method is not applicable in my case, as the runtime is dominated by the processing of very few larger elements in my dataset. Hence, I need to distribute the processing work for these elements among the nodes in the cluster. To do so, I subdivide those elements into partial tasks and want to distribute these partial tasks to other nodes by employing a custom partitioner.
Now, my question is the following: Actually, I do not need to shuffle the complete dataset but only a few elements. So is there a way of telling within the partitioner, that data should reside on the same task manager? Thanks! Cheers, Sebastian