[ https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503536#comment-16503536 ]
Elias Levy commented on FLINK-8886: ----------------------------------- I am ok with having a separate issue for per job JVM isolation, but I am primarily interested on the per job TM isolation via scheduling to matching TMs. In practice that will give us JVM isolation without having to wait for anything else. > Job isolation via scheduling in shared cluster > ---------------------------------------------- > > Key: FLINK-8886 > URL: https://issues.apache.org/jira/browse/FLINK-8886 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination, Local Runtime, Scheduler > Affects Versions: 1.5.0 > Reporter: Elias Levy > Assignee: Renjie Liu > Priority: Major > > Flink's TaskManager executes tasks from different jobs within the same JVM as > threads. We prefer to isolate different jobs on their own JVM. Thus, we > must use different TMs for different jobs. As currently the scheduler will > allocate task slots within a TM to tasks from different jobs, that means we > must stand up one cluster per job. This is wasteful, as it requires at least > two JobManagers per cluster for high-availability, and the JMs have low > utilization. > Additionally, different jobs may require different resources. Some jobs are > compute heavy. Some are IO heavy (lots of state in RocksDB). At the moment > the scheduler threats all TMs are equivalent, except possibly in their number > of available task slots. Thus, one is required to stand up multiple cluster > if there is a need for different types of TMs. > It would be useful if one could specify requirements on job, such that they > are only scheduled on a subset of TMs. Properly configured, that would > permit isolation of jobs in a shared cluster and scheduling of jobs with > specific resource needs. > One possible implementation is to specify a set of tags on the TM config file > which the TMs used when registering with the JM, and another set of tags > configured within the job or supplied when submitting the job. The scheduler > could then match the tags in the job with the tags in the TMs. In a > restrictive mode the scheduler would assign a job task to a TM only if all > tags match. In a relaxed mode the scheduler could assign a job task to a TM > if there is a partial match, while giving preference to a more accurate match. -- This message was sent by Atlassian JIRA (v7.6.3#76005)