TaoYang526 commented on a change in pull request #11248: [FLINK-16299] Release containers recovered from previous attempt in w… URL: https://github.com/apache/flink/pull/11248#discussion_r386823957
########## File path: flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java ########## @@ -464,7 +472,15 @@ public void onContainerStarted(ContainerId containerId, Map<String, ByteBuffer> @Override public void onContainerStatusReceived(ContainerId containerId, ContainerStatus containerStatus) { - // We are not interested in getting container status + // We fetch the status of the container from the previous attempts. + if (containerStatus.getState() == ContainerState.NEW) { Review comment: > Are you suggesting that calling NMClientAsync.getContainerStatusAsync on a NEW container might result in onGetContainerStatusError on some Hadoop versions while onContainerStatusReceived on other versions? No, they are coexisting in Hadoop, onContainerStatusReceived is for containers that already started by AM via calling NMClient#startContainers while onGetContainerStatusError is for containers that haven't been been started by AM or other causes like NM lost. > If that is the case, I think we can have a common method handling releasing the container and removing it from the worker node map Yes, a common method is necessary. > One more question, how do we now whether a container is NEW or there's some other problems in onGetContainerStatusError? There maybe several causes for this handling, such as container is not found on NM or NM can't be connected, but they can be considered as a same problem: this container may be not useable for now since we can't get the status successfully, I think we can just handle this as above no matter what the real cause is. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services