Hi all,

I’m busy tuning up a workflow (defined w/Cascading, planned with Flink) that 
runs on a 5 slave EMR cluster.

The default parallelism (from the Flink planner) is set to 40, since I’ve got 5 
task managers (one per node) and 8 slots/TM.

But this seems to jam things up, as I see simultaneous GroupReduce subtasks 
competing for resources (or so it seems).

Any insight into how to tune this?

And what’s the right way to set it on a sub-task basis? With Cascading Flows 
planned for M-R I can set the number of reducers via a Hadoop JobConf 
configuration setting, on a per-step (to use Cascading lingo) basis. But with a 
Flow planned for Flink, there’s only a single “step”.

Thanks,

— Ken

Reply via email to