Hi 刘建刚, Could you explain how did you fix the problem for your case? Did you modify Flink code to use `IdleStateHandler`?
Piotrek > On 13 Feb 2020, at 11:10, 刘建刚 <liujiangangp...@gmail.com> wrote: > > Thanks for all the help. Following the advice, I have fixed the problem. > >> 2020年2月13日 下午6:05,Zhijiang <wangzhijiang...@aliyun.com >> <mailto:wangzhijiang...@aliyun.com>> 写道: >> >> Thanks for reporting this issue and I also agree with the below analysis. >> Actually we encountered the same issue several years ago and solved it also >> via the netty idle handler. >> >> Let's trace it via the ticket [1] as the following step. >> >> [1] https://issues.apache.org/jira/browse/FLINK-16030 >> <https://issues.apache.org/jira/browse/FLINK-16030> >> >> Best, >> Zhijiang >> >> ------------------------------------------------------------------ >> From:张光辉 <beggingh...@gmail.com <mailto:beggingh...@gmail.com>> >> Send Time:2020 Feb. 12 (Wed.) 22:19 >> To:Benchao Li <libenc...@gmail.com <mailto:libenc...@gmail.com>> >> Cc:刘建刚 <liujiangangp...@gmail.com <mailto:liujiangangp...@gmail.com>>; user >> <user@flink.apache.org <mailto:user@flink.apache.org>> >> Subject:Re: Encountered error while consuming partitions >> >> Network can fail in many ways, sometimes pretty subtle (e.g. high ratio >> packet loss). >> >> The problem is that the long tcp connection between netty client and server >> is lost, then the server failed to send message to the client, and shut down >> the channel. The Netty Client does not know that the connection has been >> disconnected, so it has been waiting. >> >> To detect long tcp connection alive on netty client and server, we should >> have two ways: tcp keepalives and heartbeat. >> Tcp keepalives is 2 hours by default. When the error occurs, if you continue >> to wait for 2 hours, the netty client will trigger exception and enter >> failover recovery. >> If you want to detect long tcp connection quickly, netty provides >> IdleStateHandler which it use ping-pang mechanism. If netty client send >> continuously n ping message and receive no one pang message, then trigger >> exception. >> <mailto:libenc...@pku.edu.cn> >> >