I am using the async IO operator. The problem is that increasing source parallelism from 1 to 2 was enough to tip our systems over the edge. Reducing the parallelism of async IO operator to 2 is not an option as that would reduce the throughput quite a bit. This means that no matter what we do, we'll end up with different operators with different parallelism.
What I meant with: "running all operators at such a high scale would result in wastage of resources, even with operator chaining in place." was that creating as many subtasks as that of the windowing operator for each of my operators would lead to sub-optimal performance. While chaining would ensure that all tasks would run in one slot, the partitioning of data would result in the same network IO as chaining doesn't guarantee that the same tuple is processed in 1 slot. In my experience, running operators with same parallelism of each operator is always inferior compared to hand tuned parallelism. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/