[ https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523444#comment-16523444 ]
Renjie Liu commented on FLINK-8886: ----------------------------------- [~till.rohrmann], [~elevy], [~mingleizhang] I've created an [issue|https://issues.apache.org/jira/browse/FLINK-9662] for the first part of problem and wrote a design doc for it, please help to review. > Job isolation via scheduling in shared cluster > ---------------------------------------------- > > Key: FLINK-8886 > URL: https://issues.apache.org/jira/browse/FLINK-8886 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination, Local Runtime, Scheduler > Affects Versions: 1.5.0 > Reporter: Elias Levy > Assignee: Renjie Liu > Priority: Major > > Flink's TaskManager executes tasks from different jobs within the same JVM as > threads. We prefer to isolate different jobs on their own JVM. Thus, we > must use different TMs for different jobs. As currently the scheduler will > allocate task slots within a TM to tasks from different jobs, that means we > must stand up one cluster per job. This is wasteful, as it requires at least > two JobManagers per cluster for high-availability, and the JMs have low > utilization. > Additionally, different jobs may require different resources. Some jobs are > compute heavy. Some are IO heavy (lots of state in RocksDB). At the moment > the scheduler threats all TMs are equivalent, except possibly in their number > of available task slots. Thus, one is required to stand up multiple cluster > if there is a need for different types of TMs. > It would be useful if one could specify requirements on job, such that they > are only scheduled on a subset of TMs. Properly configured, that would > permit isolation of jobs in a shared cluster and scheduling of jobs with > specific resource needs. > One possible implementation is to specify a set of tags on the TM config file > which the TMs used when registering with the JM, and another set of tags > configured within the job or supplied when submitting the job. The scheduler > could then match the tags in the job with the tags in the TMs. In a > restrictive mode the scheduler would assign a job task to a TM only if all > tags match. In a relaxed mode the scheduler could assign a job task to a TM > if there is a partial match, while giving preference to a more accurate match. -- This message was sent by Atlassian JIRA (v7.6.3#76005)