We encountered the netty channel inactive issue while the AKKA gated that task manager. I'm wondering whether the channel closed because of the AKKA gated status, since all message to the taskManager will be dropped at that moment, which might cause netty channel exception. If so, shall we have coordination between AKKA and Netty? The gated status is not intended to fail the system. Here is the stack trace fthe or exception
2019-04-12 12:46:38.413 [flink-akka.actor.default-dispatcher-90] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 3758 (3788228399 bytes in 5967 ms). 2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-25 - Association with remote system [akka.tcp://flink@athena592-phx2:44487] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-25 - Association with remote system [akka.tcp://flink@athena592-phx2:44487] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 2019-04-12 12:49:14.230 [flink-akka.actor.default-dispatcher-65] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - id (14/96) (93fcbfc535a190e1edcfd913d5f304fe) switched from RUNNING to FAILED. org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'athena592-phx2/ 10.80.118.166:44177'. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:117) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:748)