GitHub user sihuazhou opened a pull request: https://github.com/apache/flink/pull/4949
[FLINK-7866] [runtime] Weigh list of preferred locations for scheduling ## What is the purpose of the change This PR fixs [FLINK-7866](https://issues.apache.org/jira/browse/FLINK-7866). Currently, scheduler only use the list of preferred locations to decide where to schedule a task, this can be optimized by weigh the locations. That way, we would obtain better locality in some cases, moreover this PR also introduce `CandidateLocation` and `CandidateLocationEvaluator` to enable us to weigh location for `ExecutionVertex` by both state and input. A simple weigh example: - Preferred locations list: {{[location1, location2, location2]}} - Weighted preferred locations list {{[(location2 , 2), (location1, 1)]}} ## Brief change log - *Add CandidateLocation to represent a possible preferred location* - *Add CandidateLocationEvaluator to evaluate a candidate location, currently there are only INPUT_ONLY and STATE_ONLY evaluator, but this can easily be extended* - *add evaluation logic when allocate slot for `Execution`, it first gets a set of candidate locations, which are then measured by the evaluator, finally, return a location list that order by the weighted result desc* ## Verifying this change This change added tests and can be verified as follows: - add `testCandidateLocationEvaluateResult` test in `ExecutionTest` to make sure the evaluate logic. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) ## Documentation - Does this pull request introduce a new feature? (no) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sihuazhou/flink weigh_location Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4949 ---- commit 4b867961d1c2061372145ab74f5b3e88b177b91a Author: summerleafs <summerle...@163.com> Date: 2017-11-06T05:43:25Z introduce CandidateLocation and CandidateLocationEvaluator for weigh the preferred locations. ---- ---