TaoYang526 commented on a change in pull request #11248: [FLINK-16299] Release 
containers recovered from previous attempt in w…
URL: https://github.com/apache/flink/pull/11248#discussion_r386791133
 
 

 ##########
 File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java
 ##########
 @@ -464,7 +472,15 @@ public void onContainerStarted(ContainerId containerId, 
Map<String, ByteBuffer>
 
        @Override
        public void onContainerStatusReceived(ContainerId containerId, 
ContainerStatus containerStatus) {
-               // We are not interested in getting container status
+               // We fetch the status of the container from the previous 
attempts.
+               if (containerStatus.getState() == ContainerState.NEW) {
 
 Review comment:
   ContainerStatus#getState() may only returns RUNNING(means it's on starting 
or started) or COMPLETE(means it has finished) in most hadoop versions, rare 
versions may contains NEW or SCHEDULED. So that I think this condition can be 
declared as not RUNNING here, and we should add a condition like  `if 
(containerStatus.getState() != ContainerState.COMPLETE)` for the 
resourceManagerClient#releaseAssignedContainer calling since there's no 
necessary to do that.
   
   We also should handle this in onGetContainerStatusError method for 
containers that haven't been started via calling NMClient#startContainer yet by 
the last AM.
   
   Last suggestion is to consider consistence when internal state may be 
updated inside, it can be handled by calling runAsyc(...).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to