Xintong Song created FLINK-20332:
------------------------------------

             Summary: Add workers recovered from previous attempt to pending 
resources
                 Key: FLINK-20332
                 URL: https://issues.apache.org/jira/browse/FLINK-20332
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Coordination
            Reporter: Xintong Song
            Assignee: Xintong Song


For active deployments (Native K8s/Yarn/Mesos), after a JM failover, workers 
from previous attempt should register to the new JM. Depending on the order 
that slot requests and TM registrations arrive at the RM, it could happen that 
RM allocates unnecessary new resources while there are recovered resources that 
can be reused.

A potential improvement is to add recovered workers to pending resources, so 
that RM knows what resources are expected to be available soon and decide 
whether to allocate new resources accordingly.

See also the discussion in FLINK-20249.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to