There is no exception or any warning in the task manager
`'athena592-phx2/10.80.118.166:44177'` log. In addition, the host was not
shut down either in cluster monitor dashboard. It probably requires to turn
on DEBUG log to get more useful information. If the task manager gets
killed, I assume there will be terminating log in the task manager log. If
not, I don't know how to figure out whether it's due to task manager gets
killed or just a connection timeout.



On Sun, Apr 14, 2019 at 7:22 PM zhijiang <wangzhijiang...@aliyun.com> wrote:

> Hi Wenrui,
>
> I think the akka gated issue and inactive netty channel are both caused by
> some task manager exits/killed. You should double check the status and
> reason of this task manager `'athena592-phx2/10.80.118.166:44177'`.
>
> Best,
> Zhijiang
>
> ------------------------------------------------------------------
> From:Wenrui Meng <wenruim...@gmail.com>
> Send Time:2019年4月13日(星期六) 01:01
> To:user <user@flink.apache.org>
> Cc:tzulitai <tzuli...@apache.org>
> Subject:Netty channel closed at AKKA gated status
>
> We encountered the netty channel inactive issue while the AKKA gated that
> task manager. I'm wondering whether the channel closed because of the AKKA
> gated status, since all message to the taskManager will be dropped at that
> moment, which might cause netty channel exception. If so, shall we have
> coordination between AKKA and Netty? The gated status is not intended to
> fail the system. Here is the stack trace fthe or exception
>
> 2019-04-12 12:46:38.413 [flink-akka.actor.default-dispatcher-90] INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Completed
> checkpoint 3758 (3788228399 bytes in 5967 ms).
> 2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN
> akka.remote.ReliableDeliverySupervisor
> flink-akka.remote.default-remote-dispatcher-25 - Association with remote
> system [akka.tcp://flink@athena592-phx2:44487] has failed, address is now
> gated for [5000] ms. Reason: [Disassociated]
> 2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN
> akka.remote.ReliableDeliverySupervisor
> flink-akka.remote.default-remote-dispatcher-25 - Association with remote
> system [akka.tcp://flink@athena592-phx2:44487] has failed, address is now
> gated for [5000] ms. Reason: [Disassociated]
> 2019-04-12 12:49:14.230 [flink-akka.actor.default-dispatcher-65] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - id (14/96)
> (93fcbfc535a190e1edcfd913d5f304fe) switched from RUNNING to FAILED.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager 'athena592-phx2/
> 10.80.118.166:44177'. This might indicate that the remote task manager
> was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:117)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
>         at
> org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610)
>         at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:748)
>
>
>

Reply via email to