Is there an easy way to understand if and when my data get skewed in the pipeline?
On Fri, Feb 5, 2016 at 4:09 PM, Stephan Ewen <se...@apache.org> wrote: > Yes, that is definitely one possible explanation. > > Another one could be that there is data skew, that increased parallelism > does not take work of the most overloaded partition (but reduces available > memory from that partition). > The web dashboard should actually help you with checking that. > > > On Fri, Feb 5, 2016 at 3:34 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Sorry, I forgot to say that the numberOfTaskSlots is always 6.. >> >> On Fri, Feb 5, 2016 at 3:32 PM, Flavio Pompermaier <pomperma...@okkam.it> >> wrote: >> >>> Hi to all, >>> >>> I'm testing how to speed up my Flink job and I faced the following >>> situations in my *6 nodes* cluster (where each node has 8 CPUs) and 1 >>> node does also the job manager: >>> >>> Scenario 1: >>> >>> - # of network buffers 4096 >>> - parallelism: 36 >>> - *The job fails because I have not enough network buffers* >>> >>> Scenario 2: >>> >>> - # of network buffers *8192* >>> - parallelism: 36 >>> - *The job ends successfully in about 20 minutes * >>> >>> Scenario 3: >>> >>> - # of network buffers *4096* >>> - 6 nodes >>> - parallelism: *6* >>> - *The job ends successfully in about 11 minutes* >>> >>> What can I infer from those results? That my job is I/O bounded thus >>> having more threads in the same machine accessing simultaneously to the >>> disk downgrade the performance of the pipeline? >>> >>> Best, >>> Flavio >>> >> >> >