GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/5561
[FLINK-8748] [flip6] Cancel slot allocations for alternatively completed slot requests ## What is the purpose of the change If a slot request is fulfilled with a different AllocatedSlot in the SlotPool, then we cancel the slot request sent to the ResourceManager. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink slotPoolCancelSlotRequests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5561 ---- commit 8cb296e2a5c9ddc2234e04791f7aab8eaec73b1b Author: Till Rohrmann <trohrmann@...> Date: 2018-02-21T14:57:50Z [FLINK-8732] [flip6] Cancel ongoing scheduling operation Keeps track of ongoing scheduling operations in the ExecutionGraph and cancels them in case of a concurrent cancel, suspend or fail call. This makes sure that the original cause for termination is maintained. This closes #5548. commit b1dd80ccbe2c2a71758524d5d6f0ffa5fdd84a30 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-01T13:37:15Z [hotfix] Fix checkstyle violations in ExecutionGraph commit f17c50bd5c36270928267e3ed6ca6fb2ffea0ccc Author: Till Rohrmann <trohrmann@...> Date: 2018-02-01T17:04:06Z [FLINK-8627] Introduce new JobStatus#SUSPENDING to ExecutionGraph The new JobStatus#SUSPENDING says that an ExecutionGraph has been suspended but its clean up has not been done yet. Only after all Executions have been canceled, the ExecutionGraph will enter the SUSPENDED state and complete the termination future accordingly. This closes #5445. commit 0a6973ba32c0bd1a3e8a3f0af3ed2bac7e4917d9 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-13T15:14:41Z [FLINK-8629] [flip6] Allow JobMaster to rescale jobs This commit adds the functionality to rescale a job or parts of it to the JobMaster. In order to rescale a job, the JobMaster does the following: 1. Take a savepoint 2. Create a rescaled ExecutionGraph from the JobGraph 3. Initialize it with the taken savepoint 4. Suspend the old ExecutionGraph 5. Restart the new ExecutionGraph once the old ExecutionGraph has been suspended This closes #5446. commit 9c29e815b960796c33511a14483848f52a2454c5 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-13T15:34:31Z [FLINK-8633] [flip6] Expose rescaling of jobs via the Dispatcher This commit exposes the JobMaster#rescaleJob via the Dispatcher. This will allow it to call this functionality from a REST handler. This closes #5452. commit b3e65c6914970bdce20b1fa572655403200ae2a1 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-02T10:06:35Z [FLINK-8634] [rest] Introduce job rescaling REST handler Add rescaling REST handler as a sub class of the AbstractAsynchronousOperationHandlers. This closes #5451. commit 608a9204be0fcec8ba771ca3688586deadbadc5e Author: Till Rohrmann <trohrmann@...> Date: 2018-02-11T18:50:46Z [FLINK-8635] [rest] Register rescaling handlers at web endpoint This closes #5454. commit cd27bf03a954c23c3879f81eadfb4af89f2e4a91 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-13T16:29:32Z [FLINK-8656] [flip6] Add modify CLI command to rescale Flink jobs Jobs can now be rescaled by calling flink modify <JOB_ID> -p <PARALLELISM>. Internally, the CliFrontend will send the corresponding REST call and poll for status updates. This closes #5487. commit 16e88e61aa0172e9de59cfa3756f230c045777a4 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-22T11:36:53Z [FLINK-8746] [flip6] Allow rescaling of partially running jobs This commit enables the rescaling of Flink jobs which are currently not fully deployed. In such a case, Flink will use the last internal rescaling savepoint. If there is no such savepoint, then it will use the provided savepoint when the job was submitted. In case that there is no savepoint at all, then it will restart the job with vanilla state. commit 0ac1b3dabb4e73d08d2198ab56b961201b1e87cf Author: Till Rohrmann <trohrmann@...> Date: 2018-02-22T13:12:48Z [hotfix] Register job status listener for rescaled job commit 3a09100df0013eb0abec255efb4a4e09fccf1903 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-22T13:10:29Z [FLINK-8748] [flip6] Cancel slot allocations for alternatively completed slot requests If a slot request is fulfilled with a different AllocatedSlot in the SlotPool, then we cancel the slot request sent to the ResourceManager. commit 28fe5008d3e2dc8b98d6dd2e947eec1ce3ee1941 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-22T14:37:28Z [hotfix] Avoid redundant slot release operations commit 970fae405fb00f5e56481d72ee247cdedb5c4d57 Author: Till Rohrmann <trohrmann@...> Date: 2018-02-22T14:37:37Z [hotfix] Cancel pending slot request when SlotPool is suspended ---- ---