Hello, I have standalone Flink cluster with JobManager HA. Last night, JobManager failovered because of the connection timeout to Zookeeper. Job is successfully running under new leader JobManager, but when I see the old leader JobManager log, it is trying to re-submit job and getting errors. ( for almost 24 hours now)
Here is the log. ----- 2016-07-27 20:56:09,218 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard message LeaderSessionMessage(54757d58-64d0-4118-a4d3-5f089287f1e4,07/27/2016 20:56:09 Job execution switched to status RESTARTING.) because the expected leader session ID None did not equal the received leader session ID Some(54757d58-64d0-4118-a4d3-5f089287f1e4). 2016-07-27 20:56:19,218 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Recovering checkpoints from ZooKeeper. 2016-07-27 20:56:19,218 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard message LeaderSessionMessage(54757d58-64d0-4118-a4d3-5f089287f1e4,07/27/2016 20:56:19 Job execution switched to status CREATED.) because the expected leader session ID None did not equal the received leader session ID Some(54757d58-64d0-4118-a4d3-5f089287f1e4). 2016-07-27 20:56:19,219 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 1 checkpoints in ZooKeeper. 2016-07-27 20:56:19,221 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Initialized with Checkpoint 40229 @ 1469620528216 for 978ef000cca5a3aa6f3461274102f82c. Removing all older checkpoints. 2016-07-27 20:56:19,222 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard message LeaderSessionMessage(54757d58-64d0-4118-a4d3-5f089287f1e4,07/27/2016 20:56:19 Job execution switched to status RUNNING.) because the expected leader session ID None did not equal the received leader session ID Some(54757d58-64d0-4118-a4d3-5f089287f1e4). 2016-07-27 20:56:19,222 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/3) (bbdf55db0c19cc881c188bc6925929d0) switched from CREATED to SCHEDULED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/3) (bbdf55db0c19cc881c188bc6925929d0) switched from SCHEDULED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (2/3) (4c795c671ec7b548b5faac5b141c331c) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard message LeaderSessionMessage(54757d58-64d0-4118-a4d3-5f089287f1e4,07/27/2016 20:56:19 Job execution switched to status FAILING.) because the expected leader session ID None did not equal the received leader session ID Some(54757d58-64d0-4118-a4d3-5f089287f1e4). 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (3/3) (fce3b243e5b25041aafabbd93a266dbc) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/3) (e1e5154f506901539e12b0fe8c140503) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (2/3) (f95eb0ad8fcc50e6bb9046e8700e8778) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (3/3) (0e30de47933282533cf6dda3a22e7ddc) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map (1/3) (ea260b7740d4ac8262c6500429b0ee6b) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map (2/3) (cc5ab4fc296238d32078d2b4a8eb5062) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map (3/3) (9694ae32fc12ec416197308f6a8cb3c1) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - TriggerWindow(GlobalWindows(), FoldingStateDescriptor{name=window-contents, defaultValue=ViewerCountHll(0,0,,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus@1), serializer=null}, LiveContinuousProcessingTimeTriggerGlobal(10000), WindowedStream.fold(WindowedStream.java:207)) -> Filter -> Map -> Filter -> Sink: Unnamed (1/3) (9c6b27873b6ddec58ce3f82f62632152) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - TriggerWindow(GlobalWindows(), FoldingStateDescriptor{name=window-contents, defaultValue=ViewerCountHll(0,0,,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus@1), serializer=null}, LiveContinuousProcessingTimeTriggerGlobal(10000), WindowedStream.fold(WindowedStream.java:207)) -> Filter -> Map -> Filter -> Sink: Unnamed (2/3) (47442827157e04f7e1936ec1d5c876e9) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - TriggerWindow(GlobalWindows(), FoldingStateDescriptor{name=window-contents, defaultValue=ViewerCountHll(0,0,,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus@1), serializer=null}, LiveContinuousProcessingTimeTriggerGlobal(10000), WindowedStream.fold(WindowedStream.java:207)) -> Filter -> Map -> Filter -> Sink: Unnamed (3/3) (a1436ef922932ffbb38f5c23684a43ec) switched from CREATED to CANCELED 2016-07-27 20:56:19,223 INFO org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy - Delaying retry of job execution for 10000 ms ... 2016-07-27 20:56:19,223 WARN org.apache.flink.runtime.jobmanager.JobManager - Discard message LeaderSessionMessage(54757d58-64d0-4118-a4d3-5f089287f1e4,07/27/2016 20:56:19 Job execution switched to status RESTARTING.) because the expected leader session ID None did not equal the received leader session ID Some(54757d58-64d0-4118-a4d3-5f089287f1e4). ---- Could anyone advise me why this happens and how I can recover from this situation? (restart JobManager?) Regards, Hironori Ogibayashi