I'm running a Flink cluster as a YARN application, started by: ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d
There are 2 worker nodes, so each are allocated 2 task managers. There is a stateful Flink job running on the cluster with a parallelism of 2. If I now want to increase the number of worker nodes to 3, and add 2 extra task managers, and then increase the job parallelism, how should I do this? I'm using EMR, so adding an extra worker node and making it available to YARN is easy to do via the AWS console. But I haven't been able to find any information in Flink docs about how to resize a running Flink cluster on YARN. Is it possible to resize it while the YARN application is running, or do I need to stop the YARN application and redeploy the cluster? Also do I need to redeploy my Flink job from a savepoint to increase its parallelism, or do I do this while the job is running? I tried redeploying the cluster having added a third worker node, via: > yarn application -kill myflinkcluster > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d (note increasing the number of task managers from 4 to 6) Surprisingly, this redeployed a Flink cluster with 4 task mangers (not 6!) and restored my job from the last checkpoint. Can anyone point me in the right direction? Thanks, Josh