[ 
https://issues.apache.org/jira/browse/FLINK-32883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755328#comment-17755328
 ] 

Yangze Guo commented on FLINK-32883:
------------------------------------

ok. I see your problem. The k8s operator will enforce the job parallelism to 
align with the total slot num. Not sure about the context of this enforcement. 
Maybe [~wangyang0918] can give more information about it.

> Support for standby task managers
> ---------------------------------
>
>                 Key: FLINK-32883
>                 URL: https://issues.apache.org/jira/browse/FLINK-32883
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.6.0
>            Reporter: Tomoyuki NAKAMURA
>            Priority: Major
>
> [https://docs.ververica.com/user_guide/application_operations/deployments/scaling.html#run-with-standby-taskmanager]
> I would like to be able to support standby task managers. Because on K8s, 
> pods are often evicted or deleted due to node failure or autoscaling.
> With the current implementation, it is not possible to set up a standby task 
> manager, and jobs cannot run until all task managers are up and running. If a 
> standby task manager could be supported, jobs could continue to run without 
> downtime using the standby task manager, even if the task manager is 
> unexpectedly deleted.
> [https://github.com/apache/flink-kubernetes-operator/blob/release-1.6.0/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java#L370-L380|https://github.com/apache/flink-kubernetes-operator/blob/release-1.6.0/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/FlinkConfigBuilder.java#L370-L380]
> If the job manager's number of replicas is set, the job's parallelism setting 
> is ignored, but it should be possible to support a standby task manager by 
> automatically setting parallelism to the replicas*task slot only if the job's 
> parallelism is not set (i.e. 0) and using that value if parallelism is set. 
> If this change looks good, I will send a PR on GitHub.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to