[ https://issues.apache.org/jira/browse/FLINK-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maximilian Michels closed FLINK-3543. ------------------------------------- Resolution: Implemented Closing this because we have the ResourceManager in place. > Introduce ResourceManager component > ----------------------------------- > > Key: FLINK-3543 > URL: https://issues.apache.org/jira/browse/FLINK-3543 > Project: Flink > Issue Type: New Feature > Components: JobManager, ResourceManager, TaskManager > Affects Versions: 1.1.0 > Reporter: Maximilian Michels > Assignee: Maximilian Michels > Fix For: 1.1.0 > > Attachments: ResourceManagerSketch.pdf > > > So far the JobManager has been the central instance which is responsible for > resource management and allocation. > While thinking about how to integrate Mesos support in Flink, people from the > Flink community realized that it would be nice to delegate resource > allocation to a dedicated process. This process may run independently of the > JobManager which is a requirement for proper integration of cluster > allocation frameworks like Mesos. > This has led to the idea of creating a new component called the > {{ResourceManager}}. Its task is to allocate and maintain resources requested > by the {{JobManager}}. The ResourceManager has a very abstract notion of > resources. > Initially, we thought we could make the ResourceManager deal with resource > allocation and the registration/supervision of the TaskManagers. However, > this approach proved to add unnecessary complexity to the runtime. > Registration state of TaskManagers had to be kept in sync at both the > JobManager and the ResourceManager. > That's why [~StephanEwen] and me changed the ResourceManager's role to simply > deal with the resource acquisition. The TaskManagers still register with the > JobManager which informs the ResourceManager about the successful > registration of a TaskManager. The ResourceManager may inform the JobManager > of failed TaskManagers. Due to the insight which the ResourceManager has in > the resource health, it may detect failed TaskManagers much earlier than the > heartbeat-based monitoring of the JobManager. > At this stage, the ResourceManager is an optional component. That means the > JobManager doesn't depend on the ResourceManager as long as it has enough > resources to perform the computation. All bookkeeping is performed by the > JobManager. When the ResourceManager connects to the JobManager, it receives > the current resources, i.e. task manager instances, and allocates more > containers if necessary. The JobManager adjusts the number of containers > through the {{SetWorkerPoolSize}} method. > In standalone mode, the ResourceManager may be deactivated or simply use the > StandaloneResourceManager which does practically nothing because we don't > need to allocate resources in standalone mode. > In YARN mode, the ResourceManager takes care of communicating with the Yarn > resource manager. When containers fail, it informs the JobManager and tries > to allocate new containers. The ResourceManager runs as an actor within the > same actor system as the JobManager. It could, however, also run > independently. The independent mode would be the default behavior for Mesos > where the framework master is expected to just deal with resource allocation. > The attached figures depict the message flow between ResourceManager, > JobManager, and TaskManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)