Hi Josh, Yes, currently that is a reasonable workaround.
Best, Marton On Thu, Jun 30, 2016 at 12:38 PM, Josh <jof...@gmail.com> wrote: > Hi Till, > > Thanks, that's very helpful! > So I guess in that case, since it isn't possible to increase the job > parallelism later, it might be sensible to use say 10x the parallelism that > I need right now (even if only running on a couple of task managers) - so > that it's possible to scale the job in the future if I need to? > > Josh > > On Thu, Jun 30, 2016 at 11:17 AM, Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Josh, >> >> at the moment it is not possible to dynamically increase the parallelism >> of your job. The same holds true for a restarting a job from a savepoint. >> But we're currently working on exactly this. So in order to change the >> parallelism of your job, you would have to restart the job from scratch. >> >> Adding task managers dynamically to your running Flink cluster, is >> possible if you allocate new YARN containers and then start a TaskManager >> process manually with the current job manager address and port. You can >> either find the address and port out using the web dashboard under job >> manager configuration or you look up the .yarn-properties file which is >> stored in your temp directory on your machine. This file also contains the >> job manager address. But the easier way would be to stop your yarn session >> and then restart it with an increased number of containers. Because then, >> you wouldn't have to ship the lib directory, which might contain user code >> classes, manually. >> >> Cheers, >> Till >> >> On Wed, Jun 29, 2016 at 10:13 PM, Josh <jof...@gmail.com> wrote: >> >>> I'm running a Flink cluster as a YARN application, started by: >>> ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d >>> >>> There are 2 worker nodes, so each are allocated 2 task managers. There >>> is a stateful Flink job running on the cluster with a parallelism of 2. >>> >>> If I now want to increase the number of worker nodes to 3, and add 2 >>> extra task managers, and then increase the job parallelism, how should I do >>> this? >>> >>> I'm using EMR, so adding an extra worker node and making it available to >>> YARN is easy to do via the AWS console. But I haven't been able to find any >>> information in Flink docs about how to resize a running Flink cluster on >>> YARN. Is it possible to resize it while the YARN application is running, or >>> do I need to stop the YARN application and redeploy the cluster? Also do I >>> need to redeploy my Flink job from a savepoint to increase its parallelism, >>> or do I do this while the job is running? >>> >>> I tried redeploying the cluster having added a third worker node, via: >>> >>> > yarn application -kill myflinkcluster >>> >>> > ./bin/yarn-session.sh -n 6 -jm 2048 -tm 4096 -d >>> >>> (note increasing the number of task managers from 4 to 6) >>> >>> Surprisingly, this redeployed a Flink cluster with 4 task mangers (not >>> 6!) and restored my job from the last checkpoint. >>> >>> Can anyone point me in the right direction? >>> >>> Thanks, >>> >>> Josh >>> >>> >>> >>> >> >