[ https://issues.apache.org/jira/browse/FLINK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423712#comment-16423712 ]
Till Rohrmann commented on FLINK-9120: -------------------------------------- In order to properly recover from TM failures the state should be accessible to all TMs as [~sihuazhou] mentioned. Does this problem also happens with Flink 1.5? I will unblock 1.5.0 from this issue but we should still keep an eye on it. > Task Manager Fault Tolerance issue > ---------------------------------- > > Key: FLINK-9120 > URL: https://issues.apache.org/jira/browse/FLINK-9120 > Project: Flink > Issue Type: Bug > Components: Cluster Management, Configuration, Core > Affects Versions: 1.4.2 > Reporter: dhiraj prajapati > Priority: Blocker > Attachments: flink-dhiraj.prajapati-client-ip-10-14-25-115.log, > flink-dhiraj.prajapati-client-ip-10-14-25-115.log, > flink-dhiraj.prajapati-jobmanager-5-ip-10-14-25-115.log, > flink-dhiraj.prajapati-jobmanager-5-ip-10-14-25-115.log, > flink-dhiraj.prajapati-taskmanager-5-ip-10-14-25-116.log, > flink-dhiraj.prajapati-taskmanager-5-ip-10-14-25-116.log > > > HI, > I have set up a flink 1.4 cluster with 1 job manager and two task managers. > The configs taskmanager.numberOfTaskSlots and parallelism.default were set > to 2 on each node. I submitted a job to this cluster and it runs fine. To > test fault tolerance, I killed one task manager. I was expecting the job to > run fine because one of the 2 task managers was still up and running. > However, the job failed. Am I missing something? -- This message was sent by Atlassian JIRA (v7.6.3#76005)