For those who would look for an answer, the fix is available in 1.18:
https://issues.apache.org/jira/browse/FLINK-31498
Proposed solution is not to request for TaskManagers if there are some
slots already pending.

On Thu, Jul 4, 2024 at 2:00 PM Alex Nitavsky <alexnitav...@gmail.com> wrote:

> Hello community,
>
> I need your help and advice to troubleshoot an unexpected issue with Flink
> version 1.17.2. I'm facing a problem related to Kubernetes (K8s) pod
> allocation.
>
> I saw strange behaviour when Flink was allocating a new TM pods. Flink was
> requesting new pods in a loop every 30 seconds. Newly allocated pods were
> stuck in the Pending state due to some issue with scheduling on K8s side
> and Flink was repeating it demand.
>
> The interesting part is that Flink was recognising that the amount of
> pending pods is increasing, but it didn't stop to request for the new TMs.
> Pods were created, but not registered.
>
> Extract of the logs (full logs for 
> `*@logger_name:org.apache.flink.kubernetes.*
> OR @logger_name:org.apache.flink.runtime.resourcemanager.active.**` are
> attached):
>
>
>    - need request 1 new workers, current worker number 4, declared worker
>    number 5
>    - Requesting new worker with resource spec WorkerResourceSpec {...},
>    current pending count: 1.
>    - Creating new TaskManager pod with name
>    flink-metering-evp-taskmanager-1-6 and resource <61440,6.0>.
>    - Pod flink-metering-evp-taskmanager-1-6 is created.
>    - need request 1 new workers, current worker number 5, declared worker
>    number 6
>    - Requesting new worker with resource spec WorkerResourceSpec {...},
>    current pending count: 2.
>    - Creating new TaskManager pod with name
>    flink-metering-evp-taskmanager-1-7 and resource <61440,6.0>.
>    - Pod flink-metering-evp-taskmanager-1-7 is created.
>    - need request 1 new workers, current worker number 6, declared worker
>    number 7
>
>
> I am not sure if it is some kind of raise condition in counter updates or
> deliberate choice to tackle the scheduling issue.
>
> Kind Regards
> Oleksandr
> ...
>
>

Reply via email to