Hello! I'm trying to figure out how it happens: I'm having a program reading from multiple socketTextStream and these text streams feed into different data flow (and these data streams never connect in my job). It looks something similar to below:
for(int i =0; i< hosts.length; i++) { DataStream<String> someStream = env.socketTextStream(hosts[i], ports[i]); DataStream<Tuple2<String, String>> joinedAdImpressions = rawMessageStream.rebalance() ... However, when I run the job on a cluster I found that all source task have been scheduled to one machine so the machine becomes a severe bottleneck for the performance. Any ideas how would this happen? Thanks!