Hi all, 

let's have a look at a simple Join with two DataSources and parallelism
p=5. 

The whole Job consists of 3 parts: 

1. DataSource Task 

2. Join Task 

3. DataSink Task 

In the first task, the data is provided and prepared for the Join task.
In particular each DataSource task creates a ResultPartition which is
divided into 5 subpartitions. Since 1/5 of the Join Task will be located
in the same node, one of these subpartitions does not have to be shipped
over the network. 

This one subpartition will be shipped to a LocalInputChannel (not
RemoteInputChannel) and therefore will not get in touch with the
network. 

Now I made some changes in the network part for my research and would
like them to affect all subpartitions. 

Question: 

Is there a feature build into flink to completely disable the local
stuff and send all subpartitions via network even if they have the same
location and destination? 

If not - does anyone have an idea where to tweak this? 

Thanks. 

Chris 

 

Reply via email to