[
https://issues.apache.org/jira/browse/FLINK-38252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xingsuo-zbz updated FLINK-38252:
--------------------------------
Fix Version/s: 1.17.1
Affects Version/s: 1.17.0
(was: 1.17.2)
(was: 1.19.3)
> ResourceManager will not apply for a new pod when pending pod is deleted
> ------------------------------------------------------------------------
>
> Key: FLINK-38252
> URL: https://issues.apache.org/jira/browse/FLINK-38252
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.17.0
> Reporter: xingsuo-zbz
> Assignee: xingsuo-zbz
> Priority: Minor
> Fix For: 1.17.1
>
>
> Our Flink job is deployed on k8s.
>
> The SRE of the k8s cluster periodically cleans up pending pods, but Flink
> does not handle the delete pending pod event, resulting in Flink jobs never
> applying for new pods and ultimately failing due to insufficient resources.
>
> This problem can be replicated using a small k8s cluster.
> For example, if the k8s cluster only has a total of 10 core CPUs, Flink job
> configuration requests four 5-core pods, and actively deletes the pending
> pods before the job resource request timeout, the ResourceManager will not
> apply for new pods.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)