Hi, Kamal

According to your description, I think this is related to Flink's fault
tolerance mechanism, you can see [1] for more detail.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/

Best,
Ron

Kamal Mittal via user <user@flink.apache.org> 于2023年8月4日周五 15:06写道:

> Hello,
>
>
>
> How flink behaves in case one of task manager POD fails out of a set of
> task managers PODs over say K8s environment?
>
>
>
> In my case, job remains in failed state even after giving re-start
> strategy with fixed delay (5 sec) and no. of attempts (5) with error as
> “Could not acquire minimum resources”.
>
>
>
> Flink doesn’t wait in this case for new task manager to come-up and
> immediately fails the job?
>
>
>
> Rgds,
>
> Kamal
>
>
>

Reply via email to