Hi Ufuk, I am willing to do some work for this issue and has a basic solution for it. And wish to get professional suggestion from you. What is the next step for it ? Looking forward to your reply! Zhijiang Wang------------------------------------------------------------------发件人:Ufuk Celebi <u...@apache.org>发送时间:2016年5月24日(星期二) 01:19收件人:user <user@flink.apache.org>; wangzhijiang999 <wangzhijiang...@aliyun.com>主 题:Re: problem of sharing TCP connection when transferring data On Mon, May 23, 2016 at 6:55 PM, wangzhijiang999 <wangzhijiang...@aliyun.com> wrote: > In summary, if one task set autoread as false, and when it notify the > available buffer, there are some messages during this time to be processed > first, if one message belongs to another failed task, the autoread for this > channel would not be set true anymore. The only way is to cancel all the > tasks in this channel to release the channel. Is it right?
Yes, very good observation. In this sense the failure model of Flink is baked in into the way the channels are multiplexed, which is a bad thing (as you already noticed with your improved failover strategy). If you want, let me look into this issue on a high level and let's fix this together as a first step. Let's have a chat about this by the end of the week. Does this work for you? After that, we can continue with the flow control issue, which is definitely a bigger task. – Ufuk