Yes, that is definitely one possible explanation. Another one could be that there is data skew, that increased parallelism does not take work of the most overloaded partition (but reduces available memory from that partition). The web dashboard should actually help you with checking that.
On Fri, Feb 5, 2016 at 3:34 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Sorry, I forgot to say that the numberOfTaskSlots is always 6.. > > On Fri, Feb 5, 2016 at 3:32 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Hi to all, >> >> I'm testing how to speed up my Flink job and I faced the following >> situations in my *6 nodes* cluster (where each node has 8 CPUs) and 1 >> node does also the job manager: >> >> Scenario 1: >> >> - # of network buffers 4096 >> - parallelism: 36 >> - *The job fails because I have not enough network buffers* >> >> Scenario 2: >> >> - # of network buffers *8192* >> - parallelism: 36 >> - *The job ends successfully in about 20 minutes * >> >> Scenario 3: >> >> - # of network buffers *4096* >> - 6 nodes >> - parallelism: *6* >> - *The job ends successfully in about 11 minutes* >> >> What can I infer from those results? That my job is I/O bounded thus >> having more threads in the same machine accessing simultaneously to the >> disk downgrade the performance of the pipeline? >> >> Best, >> Flavio >> > >