Hi to all, I'm testing how to speed up my Flink job and I faced the following situations in my *6 nodes* cluster (where each node has 8 CPUs) and 1 node does also the job manager:
Scenario 1: - # of network buffers 4096 - parallelism: 36 - *The job fails because I have not enough network buffers* Scenario 2: - # of network buffers *8192* - parallelism: 36 - *The job ends successfully in about 20 minutes * Scenario 3: - # of network buffers *4096* - 6 nodes - parallelism: *6* - *The job ends successfully in about 11 minutes* What can I infer from those results? That my job is I/O bounded thus having more threads in the same machine accessing simultaneously to the disk downgrade the performance of the pipeline? Best, Flavio