[ https://issues.apache.org/jira/browse/FLINK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358897#comment-16358897 ]
ASF GitHub Bot commented on FLINK-8629: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/5446 [FLINK-8629] [flip6] Allow JobMaster to rescale jobs ## What is the purpose of the change This commit adds the functionality to rescale a job or parts of it to the JobMaster. In order to rescale a job, the JobMaster does the following: 1. Take a savepoint 2. Create a rescaled ExecutionGraph from the JobGraph 3. Initialize it with the taken savepoint 4. Suspend the old ExecutionGraph 5. Restart the new ExecutionGraph once the old ExecutionGraph has been suspended This PR is based on #5445, #5444, #4510 ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink rescalingRpc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5446.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5446 ---- commit ffc9edd8f41c4a8508170580f945c5b9ed911d01 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-01T13:37:15Z [hotfix] Fix checkstyle violations in ExecutionGraph commit 38006cfd9fef14fa4aa0dc23cb6a4e4afd019006 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-01T17:04:06Z [FLINK-8627] Introduce new JobStatus#SUSPENDING The new JobStatus#SUSPENDING says that an ExecutionGraph has been suspended but its clean up has not been done yet. Only after all Executions have been canceled, the ExecutionGraph will enter the SUSPENDED state and complete the termination future accordingly. commit b9c77594b98c8fe8799a7149fbcfad6157d7aa5e Author: Till Rohrmann <trohrmann@...> Date: 2018-02-09T13:07:31Z [FLINK-8626] Introduce BackPressureStatsTracker interface Renames BackPressureStatsTracker into BackPressureStatsTrackerImpl and introduce a BackPressureStatsTracker interface. This will make testing easier when we don't have to set up all the different components. commit f0d7d8e69c16261f140faf2943fd15485837609b Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2017-08-10T05:41:40Z [FLINK-7124] [flip-6] Add test to verify rescaling JobGraphs works correctly This commit adds two tests to verify behaviours of rescaling JobGraphs: 1. JobGraphs can be consecutively rescaled to arbitrary valid DOPs 2. Rescaling beyond max parallelism would fail The second test, however, is temporarily disabled for now since it doesn't properly fail. commit 2a473673e00d3ab7a2597eb5182b162f342c2d96 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-01T15:14:53Z [FLINK-8546] [flip6] Respect savepoints and restore from latest checkpoints Let the JobMaster respect checkpoints and savepoints. The JobMaster will always try to restore the latest checkpoint if there is one available. Next it will check whether savepoint restore settings have been set. If so, then it will try to restore the savepoint. Only if these settings are not set, the job will be started from scratch. commit f0f24a2701298010fd2403a03b8a5ff98d41eb3c Author: Till Rohrmann <trohrmann@...> Date: 2018-02-09T13:18:11Z [hotfix] [tests] Simplify JobMasterTest commit 930106c3383ea1179475e27ff608bf4df2ac0773 Author: Tzu-Li (Gordon) Tai <tzulitai@...> Date: 2017-08-10T05:57:09Z [hotfix] Refactor graph verification code in ExecutionGraphConstructionTest The refactoring resuses utility methods in ExecutionGraphTestUtils to verify constructed ExecutionGraphs. commit f902b9eb8776d7df8a9b62fa556756d00b3b4c15 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-02T08:57:14Z [FLINK-7124] Introduce parallelism <= max parallelism check into ExecutionJobVertex Check that the parallelism is smaller than the max parallelism when creating an ExecutionJobVertex. commit 7c6a18e4fdcbec7d0cdf38d16baab699eec7b208 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-01T13:37:37Z [FLINK-8629] [flip6] Allow JobMaster to rescale jobs This commit adds the functionality to rescale a job or parts of it to the JobMaster. In order to rescale a job, the JobMaster does the following: 1. Take a savepoint 2. Create a rescaled ExecutionGraph from the JobGraph 3. Initialize it with the taken savepoint 4. Suspend the old ExecutionGraph 5. Restart the new ExecutionGraph once the old ExecutionGraph has been suspended ---- > Allow JobMaster to rescale jobs > ------------------------------- > > Key: FLINK-8629 > URL: https://issues.apache.org/jira/browse/FLINK-8629 > Project: Flink > Issue Type: New Feature > Components: Distributed Coordination > Affects Versions: 1.5.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Major > Labels: flip-6 > Fix For: 1.5.0 > > > The {{JobMaster}} should be able to rescale a job or a subset of its > operators. In order to do that we have to expose RPC calls to trigger this > action. > The rescaling works by first taking a savepoint, then suspending the old job, > rescale it and then restart it from the taken savepoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)