Cheers for the quick reply Till.

That would be very useful information to have! I'll upgrade my project to
Flink 0.10.1 tongiht and let you know if I can find out if theres a skew in
the data :-)

- Pieter


2016-01-27 13:49 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:

> Could it be that your data is skewed? This could lead to different loads
> on different task managers.
>
> With the latest Flink version, the web interface should show you how many
> bytes each operator has written and received. There you could see if one
> operator receives more elements than the others.
>
> Cheers,
> Till
>
> On Wed, Jan 27, 2016 at 1:35 PM, Pieter Hameete <phame...@gmail.com>
> wrote:
>
>> Hi guys,
>>
>> Currently I am running a job in the GCloud in a configuration with 4 task
>> managers that each have 4 CPUs (for a total parallelism of 16).
>>
>> However, I noticed my job is running much slower than expected and after
>> some more investigation I found that one of the workers is doing a majority
>> of the work (its CPU load was at 100% while the others were almost idle).
>>
>> My job execution plan can be found here: http://i.imgur.com/fHKhVFf.png
>>
>> The input is split into multiple files so loading the data is properly
>> distributed over the workers.
>>
>> I am wondering if you can provide me with some tips on how to figure out
>> what is going wrong here:
>>
>>    - Could this imbalance in workload be the result of an imbalance in
>>    the hash paritioning?
>>    - Is there a convenient way to see how many elements each worker gets
>>       to process? Would it work to write the output of the CoGroup to disk
>>       because each worker writes to its own output file and investigate the
>>       differences?
>>    - Is there something strange about the execution plan that could
>>    cause this?
>>
>> Thanks and kind regards,
>>
>> Pieter
>>
>
>

Reply via email to