Hi guys,

Currently I am running a job in the GCloud in a configuration with 4 task
managers that each have 4 CPUs (for a total parallelism of 16).

However, I noticed my job is running much slower than expected and after
some more investigation I found that one of the workers is doing a majority
of the work (its CPU load was at 100% while the others were almost idle).

My job execution plan can be found here: http://i.imgur.com/fHKhVFf.png

The input is split into multiple files so loading the data is properly
distributed over the workers.

I am wondering if you can provide me with some tips on how to figure out
what is going wrong here:

   - Could this imbalance in workload be the result of an imbalance in the
   hash paritioning?
   - Is there a convenient way to see how many elements each worker gets to
      process? Would it work to write the output of the CoGroup to disk because
      each worker writes to its own output file and investigate the differences?
   - Is there something strange about the execution plan that could cause
   this?

Thanks and kind regards,

Pieter

Reply via email to