tillrohrmann commented on a change in pull request #11323:
URL: https://github.com/apache/flink/pull/11323#discussion_r414358295



##########
File path: 
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesResourceManager.java
##########
@@ -320,5 +333,16 @@ private void internalStopPod(String podName) {
                                        }
                                }
                        );
+
+               final KubernetesWorkerNode kubernetesWorkerNode = 
workerNodes.remove(resourceId);
+               final WorkerResourceSpec workerResourceSpec = 
podWorkerResources.remove(podName);
+
+               // If the stopped pod is requested in the current attempt 
(workerResourceSpec is known) and is not yet added,
+               // we need to notify ActiveResourceManager to decrease the 
pending worker count.
+               if (workerResourceSpec != null && kubernetesWorkerNode == null) 
{

Review comment:
       In the current state with calling `requestKubernetesPodIfRequired` it 
should now work.
   
   I think we are still lacking a bit of test coverage, though. For example if 
a recovered pod is being used by a job and if the pod now fails, it should be 
restarted because the `SlotManager` needs the pod.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to