Hi, Kamal According to your description, I think this is related to Flink's fault tolerance mechanism, you can see [1] for more detail.
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/ Best, Ron Kamal Mittal via user <user@flink.apache.org> 于2023年8月4日周五 15:06写道: > Hello, > > > > How flink behaves in case one of task manager POD fails out of a set of > task managers PODs over say K8s environment? > > > > In my case, job remains in failed state even after giving re-start > strategy with fixed delay (5 sec) and no. of attempts (5) with error as > “Could not acquire minimum resources”. > > > > Flink doesn’t wait in this case for new task manager to come-up and > immediately fails the job? > > > > Rgds, > > Kamal > > >