xintongsong commented on a change in pull request #11323:
URL: https://github.com/apache/flink/pull/11323#discussion_r412637786



##########
File path: 
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesResourceManager.java
##########
@@ -239,70 +247,73 @@ private void recoverWorkerNodesFromPreviousAttempts() 
throws ResourceManagerExce
                        ++currentMaxAttemptId);
        }
 
-       private void requestKubernetesPod() {
-               numPendingPodRequests++;
+       private void requestKubernetesPod(WorkerResourceSpec 
workerResourceSpec) {
+               final KubernetesTaskManagerParameters parameters =
+                       
createKubernetesTaskManagerParameters(workerResourceSpec);
+
+               final KubernetesPod taskManagerPod =
+                       
KubernetesTaskManagerFactory.createTaskManagerComponent(parameters);
+               kubeClient.createTaskManagerPod(taskManagerPod)
+                       .whenComplete(
+                               (ignore, throwable) -> {
+                                       if (throwable != null) {
+                                               final Time retryInterval = 
configuration.getPodCreationRetryInterval();
+                                               log.error("Could not start 
TaskManager in pod {}, retry in {}. ",
+                                                       
taskManagerPod.getName(), retryInterval, throwable);
+                                               scheduleRunAsync(
+                                                       () -> 
requestKubernetesPodIfRequired(workerResourceSpec),
+                                                       retryInterval);
+                                       } else {
+                                               
podWorkerResources.put(parameters.getPodName(), workerResourceSpec);
+                                               final int pendingWorkerNum = 
notifyNewWorkerRequested(workerResourceSpec);

Review comment:
       True.
   I'll move these two lines to before `kubeClient.createTaskManagerPod` (which 
is on the main thread), and clean the states if 
`kubeClient.createTaskManagerPod` is completed exceptionally.
   To guarantee the state cleaning happens also on the main thread and before 
the retry, I'll wrap it with another `runAsync`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to