I'm running a Flink cluster as a YARN application, started by:
./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d

There are 2 worker nodes, so each are allocated 2 task managers. There is a
stateful Flink job running on the cluster with a parallelism of 2.

If I now want to increase the number of worker nodes to 3, and add 2 extra
task managers, and then increase the job parallelism, how should I do this?

I'm using EMR, so adding an extra worker node and making it available to
YARN is easy to do via the AWS console. But I haven't been able to find any
information in Flink docs about how to resize a running Flink cluster on
YARN. Is it possible to resize it while the YARN application is running, or
do I need to stop the YARN application and redeploy the cluster? Also do I
need to redeploy my Flink job from a savepoint to increase its parallelism,
or do I do this while the job is running?

I tried redeploying the cluster having added a third worker node, via:

> yarn application -kill myflinkcluster

> ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d

(note increasing the number of task managers from 4 to 6)

Surprisingly, this redeployed a Flink cluster with 4 task mangers (not 6!)
and restored my job from the last checkpoint.

Can anyone point me in the right direction?

Thanks,

Josh

Reply via email to