Re: Imbalanced workload between workers

2016-01-27 Thread Stephan Ewen
Hi Pieter! Interesting, but good :-) I don't think we did much on the hash functions since 0.9.1. I am a bit surprised that it made such a difference. Well, as long as it improves with the newer version :-) Greetings, Stephan On Wed, Jan 27, 2016 at 9:42 PM, Pieter Hameete wrote: > Hi Till,

Re: Imbalanced workload between workers

2016-01-27 Thread Pieter Hameete
Hi Till, i've upgraded to Flink 0.10.1 and ran the job again without any changes to the code to see the bytes input and output of the operators and for the different workers.To my surprise it is very well balanced between all workers and because of this the job completed much faster. Are there an

Re: Imbalanced workload between workers

2016-01-27 Thread Pieter Hameete
Cheers for the quick reply Till. That would be very useful information to have! I'll upgrade my project to Flink 0.10.1 tongiht and let you know if I can find out if theres a skew in the data :-) - Pieter 2016-01-27 13:49 GMT+01:00 Till Rohrmann : > Could it be that your data is skewed? This c

Re: Imbalanced workload between workers

2016-01-27 Thread Till Rohrmann
Could it be that your data is skewed? This could lead to different loads on different task managers. With the latest Flink version, the web interface should show you how many bytes each operator has written and received. There you could see if one operator receives more elements than the others.

Imbalanced workload between workers

2016-01-27 Thread Pieter Hameete
Hi guys, Currently I am running a job in the GCloud in a configuration with 4 task managers that each have 4 CPUs (for a total parallelism of 16). However, I noticed my job is running much slower than expected and after some more investigation I found that one of the workers is doing a majority o