Hey Ravinder,

can you please share the JobManager logs as well?

The logs say that the TaskManager disconnects from the JobManager,
because that one is not reachable anymore. At this point, the running
shuffles are cancelled and you see the follow up
RemoteTransportExceptions.

– Ufuk


On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <neetu0...@gmail.com> wrote:
> Hello All,
>
> I'm running the WordCount example streaming job and it fails because of loss
> of Taskmanagers.
>
> When gone through the logs of the taskmanager it has the following messages
>
> 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
> - State backend is set to heap memory (checkpoint to jobmanager)
> 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
> 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
> 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
> 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - TaskManager akka://flink/user/taskmanager disconnects from JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
> longer reachable
> 15:27:29,384 WARN  akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>  at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Cancelling all computations and discarding all cached data.
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (17/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (20/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
> 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
> 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (6/25)
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (20/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (6/25) (2151d07383014506c5f1b0283b1986de).
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (15/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (13/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (6/25)
> 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (16/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (24/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (18/25)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
> 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (24/25)
> 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (13/25)
> 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Disassociating from JobManager
> 15:27:29,572 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has a UID that has been quarantined. Association aborted.
> 15:27:29,582 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,587 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 17 ms).
> 15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
> 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
> 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
> 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
> 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
> 15:27:37,218 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout:
> 16000 milliseconds)
> 15:27:45,235 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30
> seconds)
> 15:28:01,237 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30
> seconds)
> 15:28:31,261 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:45,960 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30
> seconds)
> 15:29:01,273 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout:
> 30 seconds)
> 15:30:11,562 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:25,958 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
>
> Could anyone explain what is the reason for the systems getting
> disassociated and be quarantined?
>
> Kind Regards,
> Ravinder Kaur
>

Reply via email to