While experimenting in a cluster setting I was experiencing some hardware
failures causing some taskmanagers to be unregistered and as a result also
failing my streaming jobs. In the logs after the taskmanager dies I see
some akka Exceptions. I think they are harmless compared to loosing
taskmanagers, just wanted to report it.

20:26:17,813 WARN  Remoting
     - Tried to associate with unreachable remote address [akka.tcp://
flink@127.0.0.1:56910]. Address is now gated for 5000 ms, all messages to
this address will be delivered to dead letters. Reason: Connecti
on refused: /127.0.0.1:56910
20:26:22,811 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - GroupedActiveDiscretizer -> BasicWindowBuffer ->
GroupedValues-partial -> GroupedValues-total -> Window Flatten ->
FormatCounts -> WriteCounts (10/20) (e691d84be7c1ab95bcab738b743dc299)
switched from C
ANCELING to CANCELED
20:27:20,683 WARN  Remoting
     - Tried to associate with unreachable remote address [akka.tcp://
flink@10.240.251.253:42117]. Address is now gated for 5000 ms, all messages
to this address will be delivered to dead letters. Reason: Con
nection refused: /10.240.251.253:42117
20:29:00,702 WARN  Remoting
     - Tried to associate with unreachable remote address [akka.tcp://
flink@10.240.251.253:42117]. Address is now gated for 5000 ms, all messages
to this address will be delivered to dead letters. Reason: Con
nection refused: /10.240.251.253:42117
20:30:19,682 WARN  akka.remote.ReliableDeliverySupervisor
     - Association with remote system [akka.tcp://flink@10.240.172.202:36898]
has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
20:30:19,779 WARN  Remoting
     - Tried to associate with unreachable remote address [akka.tc
p://flink@10.240.172.202:36898]. Address is now gated for 5000 ms, all
messages to this address will be delivered to dead letters. Reason: The
remote system has quarantined this system. No further associations to the
remote system are possible until this system is restarted.
20:30:19,779 INFO  org.apache.flink.runtime.jobmanager.JobManager
     - Task manager akka.tcp://flink@10.240.172.202:36898/user/taskmanager
terminated.
20:30:19,779 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - GroupedActiveDiscretizer -> BasicWindowBuffer -> GroupByKeyOnly ->
Window Flatten -> GroupAlsoByWindow (19/20)
(e003610224684be03180e4f101c3367a) switched from CANCELING to FAILED
20:30:19,780 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - ReadLines -> Tokenizer -> Init -> ReifyTimestampsAndWindows (20/20)
(e37268a9a671717f1cf9177e9372a861) switched from CANCELING to FAILED
20:30:19,781 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - GroupedActiveDiscretizer -> BasicWindowBuffer -> Sum.PerKey-partial
-> Sum.PerKey-total -> Window Flatten (19/20)
(4eab0b82cfc266c190fc63569644b77e) switched from CANCELING to FAILED
20:30:19,781 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - GroupedActiveDiscretizer -> BasicWindowBuffer -> Sum.PerKey-partial
-> Sum.PerKey-total -> Window Flatten (20/20)
(11656d30edd03a00ffda0f557221e152) switched from CANCELING to FAILED
20:30:19,782 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - GroupedActiveDiscretizer -> BasicWindowBuffer -> GroupByKeyOnly ->
Window Flatten -> GroupAlsoByWindow (20/20)
(8dbde0fe41675032a7052df696c7f67d) switched from CANCELING to FAILED
20:30:19,782 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
     - ReadLines -> Tokenizer -> Init -> ReifyTimestampsAndWindows (10/20)
(794ee1f56dea331b74bb27dd76579917) switched from CANCELING to FAILED
20:30:19,783 INFO  org.apache.flink.runtime.instance.InstanceManager
      - Unregistered task manager akka.tcp://flink@10.240.172.202:36898.
Number of registered task managers 8. Number of available slots 16.
20:30:19,789 WARN  Remoting
     - Tried to associate with unreachable remote address [akka.tcp://
flink@127.0.0.1:56910]. Address is now gated for 5000 ms, all messages to
this address will be delivered to dead letters. Reason: Connection refused:
/127.0.0.1:56910
20:30:27,919 INFO  org.apache.flink.runtime.instance.InstanceManager
      - Registering TaskManager at akka.tcp://
flink@10.240.172.202:36898/user/taskmanager which was marked as dead
earlier because of a heart-beat timeout.
20:30:27,919 INFO  org.apache.flink.runtime.instance.InstanceManager
      - Registered TaskManager at dataflow-benchmark-worker7 (akka.tcp://
flink@10.240.172.202:36898/user/taskmanager) as
56cca34b618e37faa010d46079ff3968. Current number of registered hosts is 9.
20:30:33,080 ERROR Remoting
     - Error encountered while processing system message acknowledgement
[4, 5] ACK[5, {3, 2, 1, 0}]
akka.remote.transport.Transport$InvalidAssociationException: Error
encountered while processing system message acknowledgement [4, 5] ACK[5,
{3, 2, 1, 0}]
Caused by: akka.remote.ResendUnfulfillableException: Unable to fulfill
resend request since negatively acknowledged payload is no longer in
buffer. The resend states between two systems are compromised and cannot be
recovered.
        at akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103)
        at
akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:288)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at
akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:185)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
20:30:33,085 INFO  org.apache.flink.runtime.jobmanager.JobManager
     - Task manager akka.tcp://flink@10.240.172.202:36898/user/taskmanager
terminated.
20:30:33,086 INFO  org.apache.flink.runtime.instance.InstanceManager
      - Unregistered task manager akka.tcp://flink@10.240.172.202:36898.
Number of registered task managers 8. Number of available slots 16.

Reply via email to