Re: Task Manager shutdown causing jobs to fail

Puneet Duggal Mon, 07 Mar 2022 07:44:25 -0800

Hi Terry Wang,

So adding to above provided context.. whenever task manager goes down, jobs go 
into failed state and do not restart. Even though there are good enough free 
slots available on other task manager to get restarted on.


Regards,
Puneet

> On 04-Mar-2022, at 4:54 PM, Terry Wang <zjuwa...@gmail.com> wrote:
> 
> Hi, Puneet~
> 
> AFAIK, that should be expected behavior that jobs on crashed TaskManager 
> restarts. HA means there is no single point risk but Flink job still need to 
> through failover to ensure state and data consistency. You may refer  
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/task_failure_recovery/
>  
> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/task_failure_recovery/>
>  for more details.
> 
> On Fri, Mar 4, 2022 at 2:50 AM Puneet Duggal <puneetduggal1...@gmail.com 
> <mailto:puneetduggal1...@gmail.com>> wrote:
> Hi,
> 
> Currently in production, i have HA session mode flink cluster with 3 job 
> managers and multiple task managers with more than enough free task slots. 
> But i have seen multiple times that whenever task manager goes down ( e.g. 
> due to heartbeat issue).. so does all the jobs running on it even when there 
> are standby task managers availaible with free slots to run them on. Has 
> anyone faced this issue?
> 
> Regards, 
> Puneet
> 
> 
> -- 
> Best Regards,
> Terry Wang

Re: Task Manager shutdown causing jobs to fail

Reply via email to