While experimenting in a cluster setting I was experiencing some hardware failures causing some taskmanagers to be unregistered and as a result also failing my streaming jobs. In the logs after the taskmanager dies I see some akka Exceptions. I think they are harmless compared to loosing taskmanagers, just wanted to report it.
20:26:17,813 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@127.0.0.1:56910]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connecti on refused: /127.0.0.1:56910 20:26:22,811 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupedActiveDiscretizer -> BasicWindowBuffer -> GroupedValues-partial -> GroupedValues-total -> Window Flatten -> FormatCounts -> WriteCounts (10/20) (e691d84be7c1ab95bcab738b743dc299) switched from C ANCELING to CANCELED 20:27:20,683 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@10.240.251.253:42117]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Con nection refused: /10.240.251.253:42117 20:29:00,702 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@10.240.251.253:42117]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Con nection refused: /10.240.251.253:42117 20:30:19,682 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.240.172.202:36898] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 20:30:19,779 WARN Remoting - Tried to associate with unreachable remote address [akka.tc p://flink@10.240.172.202:36898]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 20:30:19,779 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@10.240.172.202:36898/user/taskmanager terminated. 20:30:19,779 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupedActiveDiscretizer -> BasicWindowBuffer -> GroupByKeyOnly -> Window Flatten -> GroupAlsoByWindow (19/20) (e003610224684be03180e4f101c3367a) switched from CANCELING to FAILED 20:30:19,780 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - ReadLines -> Tokenizer -> Init -> ReifyTimestampsAndWindows (20/20) (e37268a9a671717f1cf9177e9372a861) switched from CANCELING to FAILED 20:30:19,781 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupedActiveDiscretizer -> BasicWindowBuffer -> Sum.PerKey-partial -> Sum.PerKey-total -> Window Flatten (19/20) (4eab0b82cfc266c190fc63569644b77e) switched from CANCELING to FAILED 20:30:19,781 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupedActiveDiscretizer -> BasicWindowBuffer -> Sum.PerKey-partial -> Sum.PerKey-total -> Window Flatten (20/20) (11656d30edd03a00ffda0f557221e152) switched from CANCELING to FAILED 20:30:19,782 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupedActiveDiscretizer -> BasicWindowBuffer -> GroupByKeyOnly -> Window Flatten -> GroupAlsoByWindow (20/20) (8dbde0fe41675032a7052df696c7f67d) switched from CANCELING to FAILED 20:30:19,782 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - ReadLines -> Tokenizer -> Init -> ReifyTimestampsAndWindows (10/20) (794ee1f56dea331b74bb27dd76579917) switched from CANCELING to FAILED 20:30:19,783 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@10.240.172.202:36898. Number of registered task managers 8. Number of available slots 16. 20:30:19,789 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp:// flink@127.0.0.1:56910]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:56910 20:30:27,919 INFO org.apache.flink.runtime.instance.InstanceManager - Registering TaskManager at akka.tcp:// flink@10.240.172.202:36898/user/taskmanager which was marked as dead earlier because of a heart-beat timeout. 20:30:27,919 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at dataflow-benchmark-worker7 (akka.tcp:// flink@10.240.172.202:36898/user/taskmanager) as 56cca34b618e37faa010d46079ff3968. Current number of registered hosts is 9. 20:30:33,080 ERROR Remoting - Error encountered while processing system message acknowledgement [4, 5] ACK[5, {3, 2, 1, 0}] akka.remote.transport.Transport$InvalidAssociationException: Error encountered while processing system message acknowledgement [4, 5] ACK[5, {3, 2, 1, 0}] Caused by: akka.remote.ResendUnfulfillableException: Unable to fulfill resend request since negatively acknowledged payload is no longer in buffer. The resend states between two systems are compromised and cannot be recovered. at akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103) at akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:288) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:185) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 20:30:33,085 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@10.240.172.202:36898/user/taskmanager terminated. 20:30:33,086 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@10.240.172.202:36898. Number of registered task managers 8. Number of available slots 16.