xintongsong commented on pull request #15524: URL: https://github.com/apache/flink/pull/15524#issuecomment-846704389
@tillrohrmann Not necessarily. AFAIK, the Yarn RM requires each AM to register only once, but it does not require using the same `AMRMClient(Async)`. That means, in a later leader session, we can instantiate a new `AMRMClient(Async)` to interact with the Yarn RM, as long as the AM has been registered in the first leader session. This is my understanding based on the Yarn interfaces and docs, and would need further verifications to be sure. If this is proved true, we may simply catch and ignore the registration exception if it's caused by duplication, or maintain whether this is the first leader session in RMService (which is responsible for leader election and starting the RM on obtaining leadership) and skip the registration if not. Another challenge I can see now is to handle resource changes between two leader sessions. Currently, the new leader RM relies on `RegisterApplicationMasterResponse` to find out what containers have already been allocated. This is based on the assumption that existing containers must came from the previous attempts. With multiple leader sessions, inheriting containers from previous leader sessions becomes non-trivial, because `AMRMClient(Async)` does not provide interfaces for getting all currently allocated containers. Two potential solutions are: - Leveraging `YarnClient#getContainers`. `YarnClient` is meant to be used on the client side rather than by an AM. Not sure if there's any traps using it in an AM. Hopefully not. - Alternatively, we can maintain Yarn specific component outside Flink's RM (as you've mentioned). The component can be reused across multiple Flink RMs / leader sessions. This component will be responsible for registering the AM to Yarn RM, as well as receiving the resource events from Yarn RM, which will forwarded to leader RM. Personally, I think skipping registration if not the first leader session and leveraging `YarnClient` sounds promising. If it ends up we have to maintain something Yarn specific across multiple Flink RM's, I'm leaning towards to not introducing such complexity and keeping it as is as each attempt has only one leader session. And again, I'd suggest to scope out this issue from the current PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org