Hi Josh,

at the moment it is not possible to dynamically increase the parallelism of
your job. The same holds true for a restarting a job from a savepoint. But
we're currently working on exactly this. So in order to change the
parallelism of your job, you would have to restart the job from scratch.

Adding task managers dynamically to your running Flink cluster, is possible
if you allocate new YARN containers and then start a TaskManager process
manually with the current job manager address and port. You can either find
the address and port out using the web dashboard under job manager
configuration or you look up the .yarn-properties file which is stored in
your temp directory on your machine. This file also contains the job
manager address. But the easier way would be to stop your yarn session and
then restart it with an increased number of containers. Because then, you
wouldn't have to ship the lib directory, which might contain user code
classes, manually.

Cheers,
Till

On Wed, Jun 29, 2016 at 10:13 PM, Josh <jof...@gmail.com> wrote:

> I'm running a Flink cluster as a YARN application, started by:
> ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d
>
> There are 2 worker nodes, so each are allocated 2 task managers. There is
> a stateful Flink job running on the cluster with a parallelism of 2.
>
> If I now want to increase the number of worker nodes to 3, and add 2 extra
> task managers, and then increase the job parallelism, how should I do this?
>
> I'm using EMR, so adding an extra worker node and making it available to
> YARN is easy to do via the AWS console. But I haven't been able to find any
> information in Flink docs about how to resize a running Flink cluster on
> YARN. Is it possible to resize it while the YARN application is running, or
> do I need to stop the YARN application and redeploy the cluster? Also do I
> need to redeploy my Flink job from a savepoint to increase its parallelism,
> or do I do this while the job is running?
>
> I tried redeploying the cluster having added a third worker node, via:
>
> > yarn application -kill myflinkcluster
>
> > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d
>
> (note increasing the number of task managers from 4 to 6)
>
> Surprisingly, this redeployed a Flink cluster with 4 task mangers (not 6!)
> and restored my job from the last checkpoint.
>
> Can anyone point me in the right direction?
>
> Thanks,
>
> Josh
>
>
>
>

Reply via email to