[ https://issues.apache.org/jira/browse/FLINK-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288227#comment-17288227 ]
Till Rohrmann commented on FLINK-20332: --------------------------------------- Sounds good to me like this as a first step. > Add workers recovered from previous attempt to pending resources > ---------------------------------------------------------------- > > Key: FLINK-20332 > URL: https://issues.apache.org/jira/browse/FLINK-20332 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Reporter: Xintong Song > Assignee: Xintong Song > Priority: Major > > For active deployments (Native K8s/Yarn/Mesos), after a JM failover, workers > from previous attempt should register to the new JM. Depending on the order > that slot requests and TM registrations arrive at the RM, it could happen > that RM allocates unnecessary new resources while there are recovered > resources that can be reused. > A potential improvement is to add recovered workers to pending resources, so > that RM knows what resources are expected to be available soon and decide > whether to allocate new resources accordingly. > See also the discussion in FLINK-20249. -- This message was sent by Atlassian Jira (v8.3.4#803005)