[ https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658946#comment-16658946 ]
Till Rohrmann commented on FLINK-10640: --------------------------------------- Thanks for opening this issuse [~wuzang]. I agree with [~Tison] that such a new feature needs a bit more in depth discussion. I would suggest to start creating a design document describing in more detail the intended changes. > Enable Slot Resource Profile for Resource Management > ---------------------------------------------------- > > Key: FLINK-10640 > URL: https://issues.apache.org/jira/browse/FLINK-10640 > Project: Flink > Issue Type: New Feature > Components: ResourceManager > Reporter: Tony Xintong Song > Priority: Major > > Motivation & Backgrounds > * The existing concept of task slots roughly represents how many pipeline of > tasks a TaskManager can hold. However, it does not consider the differences > in resource needs and usage of individual tasks. Enabling resource profiles > of slots may allow Flink to better allocate execution resources according to > tasks fine-grained resource needs. > * The community version Flink already contains APIs and some implementation > for slot resource profile. However, such logic is not truly used. > (ResourceProfile of slot requests is by default set to UNKNOWN with negative > values, thus matches any given slot.) > Preliminary Design > * Slot Management > A slot represents a certain amount of resources for a single pipeline of > tasks to run in on a TaskManager. Initially, a TaskManager does not have any > slots but a total amount of resources. When allocating, the ResourceManager > finds proper TMs to generate new slots for the tasks to run according to the > slot requests. Once generated, the slot's size (resource profile) does not > change until it's freed. ResourceManager can apply different, portable > strategies to allocate slots from TaskManagers. > * TM Management > The size and number of TaskManagers and when to start them can also be > flexible. TMs can be started and released dynamically, and may have different > sizes. We may have many different, portable strategies. E.g., an elastic > session that can run multiple jobs like the session mode while dynamically > adjusting the size of session (number of TMs) according to the realtime > working load. > * About Slot Sharing > Slot sharing is a good heuristic to easily calculate how many slots needed > to get the job running and get better utilization when there is no resource > profile in slots. However, with resource profiles enabling finer-grained > resource management, each individual task has its specific resource need and > it does not make much sense to have multiple tasks sharing the resource of > the same slot. Instead, we may introduce locality preferences/constraints to > support the semantics of putting tasks in same/different TMs in a more > general way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)