The following tickets are probably relevant to this KIP: https://issues.apache.org/jira/browse/KAFKA-3457 https://issues.apache.org/jira/browse/KAFKA-1894 https://issues.apache.org/jira/browse/KAFKA-3834
On 22 May 2017 at 16:30, Rajini Sivaram <rajinisiva...@gmail.com> wrote: > Ismael, > > Yes, agree. My concern was that a connection can be shutdown uncleanly at > any time. If a client is in the middle of a request, then it times out > after min(request.timeout.ms, tcp-timeout). If we add another config > option > connect.timeout.ms, then we will sometimes wait for min(connect.timeout.ms > , > tcp-timeout) and sometimes for min(request.timeout.ms, tcp-timeout), > depending > on connection state. One config option feels neater to me. > > On Mon, May 22, 2017 at 11:21 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > Rajini, > > > > For this to have the desired effect, we'd probably need to lower the > > default request.timeout.ms for the consumer and fix the underlying > reason > > why it is a little over 5 minutes at the moment. > > > > Ismael > > > > On Mon, May 22, 2017 at 4:15 PM, Rajini Sivaram <rajinisiva...@gmail.com > > > > wrote: > > > > > Hi David, > > > > > > Sorry, what I meant was: Can you reuse the existing configuration > option > > > request.timeout,ms , instead of adding a new config and add the > behaviour > > > that you have proposed in the KIP for the connection phase using this > > > timeout? I think the timeout for connection is useful. I am not sure we > > > need another configuration option to implement it. > > > > > > Regards, > > > > > > Rajini > > > > > > > > > On Mon, May 22, 2017 at 11:06 AM, 东方甲乙 <254479...@qq.com> wrote: > > > > > > > Hi Rajini. > > > > > > > > When kafka node' machine is shutdown or network is closed, the > > connecting > > > > phase could not use the request.timeout.ms, because the client > haven't > > > > send a req yet. And no response for the nio, the selector will not > > > close > > > > the connect, so it will not choose other good node to get the > metadata. > > > > > > > > > > > > Thanks > > > > David > > > > > > > > ------------------ 原始邮件 ------------------ > > > > *发件人:* "Rajini Sivaram" <rajinisiva...@gmail.com>; > > > > *发送时间:* 2017年5月22日(星期一) 20:17 > > > > *收件人:* "dev" <dev@kafka.apache.org>; > > > > *主题:* Re: [DISCUSS] KIP-148: Add a connect timeout for client > > > > > > > > > > > > Hi David, > > > > > > > > Is there a reason why you wouldn't want to use request.timeout.ms as > > the > > > > timeout parameter for connections? Then you would use the same > timeout > > > for > > > > connected and connecting phases when shutdown is unclean. You could > > still > > > > use the timeout to ensure that next metadata request is sent to > another > > > > node. > > > > > > > > Regards, > > > > > > > > Rajini > > > > > > > > On Sun, May 21, 2017 at 9:51 AM, 东方甲乙 <254479...@qq.com> wrote: > > > > > > > > > Hi Guozhang, > > > > > > > > > > > > > > > Thanks for the clarify. For the clarify 2, I think the key thing is > > not > > > > > users control how much time in maximum to wait for inside code, but > > is > > > > the > > > > > network client can be aware of the connecting can't be finished and > > > try a > > > > > good node. In the producer.sender even the selector.poll can > timeout, > > > but > > > > > the next time is also not close the previous connecting and try > > another > > > > > good node. > > > > > > > > > > > > > > > In out test env, QA shutdown one of the leader node, the producer > > send > > > > the > > > > > request will timeout and close the node's connection then request > the > > > > > metadata. But sometimes the request node is also the shutdown > node. > > > > When > > > > > connecting the shutting down node to get the metadata, it is in the > > > > > connecting phase, network client mark the connecting node's state > to > > > > > CONNECTING, but if the node is shutdown, the socket can't be aware > > of > > > > the > > > > > connecting is broken. Though the selector.poll has timeout > parameter, > > > but > > > > > it will not close the connection, so the next > > > > > time in the "networkclient.maybeUpdate" it will check if > > > > > isAnyNodeConnecting, then will not connect to any good node the get > > the > > > > > metadata. It need about several minutes to > > > > > aware the connecting is timeout and try other node. > > > > > > > > > > > > > > > So I want to add a connect.timeout parameter, the selector can > find > > > the > > > > > connecting is timeout and close the connection. It seems the > > currently > > > > the > > > > > timeout value passed in `selector.poll()` > > > > > seems can not do this. > > > > > > > > > > > > > > > Thanks, > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------ 原始邮件 ------------------ > > > > > 发件人: "Guozhang Wang";<wangg...@gmail.com>; > > > > > 发送时间: 2017年5月16日(星期二) 凌晨1:51 > > > > > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>; > > > > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > I may be a bit confused before, just clarifying a few things: > > > > > > > > > > 1. As you mentioned, a client will always try to first establish > the > > > > > connection with a broker node before it tries to send any request > to > > > it. > > > > > And after connection is established, it will either continuously > send > > > > many > > > > > requests (e.g. produce) for just a single request (e.g. metadata) > to > > > the > > > > > broker, so these two phases are indeed different. > > > > > > > > > > 2. In the connected phase, connections.max.idle.ms is used to > > > > > auto-disconnect the socket if no requests has been sent / received > > > during > > > > > that period of time; in the connecting phase, we always try to > create > > > the > > > > > socket via "socketChannel.connect" in a non-blocking call, and then > > > > checks > > > > > if the connection has been established, but all the callers of this > > > > > function (in either producer or consumer) has a timeout parameter > as > > in > > > > > `selector.poll()`, and the timeout parameter is set either by > > > > calculations > > > > > based on metadata.expiration.time and backoff for producer#sender, > or > > > by > > > > > directly passed values from consumer#poll(timeout), so although > there > > > is > > > > no > > > > > directly config controlling that, users can still control how much > > time > > > > in > > > > > maximum to wait for inside code. > > > > > > > > > > I originally thought your scenarios is more on the connected phase, > > but > > > > now > > > > > I feel you are talking about the connecting phase. For that case, I > > > still > > > > > feel currently the timeout value passed in `selector.poll()` which > is > > > > > controllable from user code should be sufficient? > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <254479...@qq.com> wrote: > > > > > > > > > > > Hi Guozhang, > > > > > > > > > > > > > > > > > > Sorry for the delay, thanks for the question. It seems two > > different > > > > > > parameters to me: > > > > > > connect.timeout.ms: only work for the connecting phrase, after > > > > connected > > > > > > phrase this parameter is not used. > > > > > > connections.max.idle.ms: currently not work in the connecting > > phrase > > > > > > (only select return readyKeys >0) will add to the expired > manager, > > > > after > > > > > > connected will check if the connection is still alive in some > time. > > > > > > > > > > > > > > > > > > Even if we change the connections.max.idle.ms to work including > > the > > > > > > connecting phrase, we can not set this parameter to a small > value, > > > such > > > > > as > > > > > > 5 seconds. Because the client is maybe busy sending message to > > other > > > > > node, > > > > > > it will be disconnected in 5 seconds, so the default value of > > > > > > connections.max.idle.ms is setting to a larger time. We should > > have > > > > two > > > > > > parameters to control the connecting phrase behavior and the > > > connected > > > > > > phrase behavior, do you think so? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------ 原始邮件 ------------------ > > > > > > 发件人: "Guozhang Wang";<wangg...@gmail.com>; > > > > > > 发送时间: 2017年5月6日(星期六) 上午7:52 > > > > > > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>; > > > > > > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client > > > > > > > > > > > > > > > > > > > > > > > > Hello David, > > > > > > > > > > > > Thanks for the KIP. For the described issue, I'm wondering if it > > can > > > be > > > > > > resolved by tuning the CONNECTIONS_MAX_IDLE_MS_CONFIG ( > > > > > > connections.max.idle.ms) on the client side? Default is 9 > minutes. > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <254479...@qq.com> wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Currently in our test environment, we found that after one of > the > > > > > broker > > > > > > > node crash (reboot or os crash), the client may still be > > connecting > > > > to > > > > > > the > > > > > > > crash node to send metadata request or other request, and it > > needs > > > > > > several > > > > > > > minutes to be aware that the connection is timeout then try > > another > > > > > node > > > > > > to > > > > > > > connect to send the request. Then the client may still not be > > aware > > > > of > > > > > > the > > > > > > > metadata change after several minutes. > > > > > > > > > > > > > > > > > > > > > So I want to add a connect timeout on the client, please > take a > > > > look > > > > > > at: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > 148%3A+Add+a+connect+timeout+for+client > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > -- [image: cake_logo_strap_screen 400.jpg] <http://www.cakesolutions.net> Simon Souter (Office) 0845 617 1200 Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5 6DA, UK [image: twitter-circle-darkgrey.png] <https://twitter.com/cakesolutions> [image: facebook-circle-darkgrey.png] <https://www.facebook.com/cakesolutionslimited/> [image: linkedin-circle-darkgrey.png] <https://www.linkedin.com/company/cake-solutions-limited> [image: Reactive Applications] <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0> Company registered in the UK, No. 4184567 If you have received this e-mail in error, please accept our apologies, destroy it immediately, and it would be greatly appreciated if you notified the sender. It is your responsibility to protect your system from viruses and any other harmful code or device. We try to eliminate them from e-mails and attachments, but we accept no liability for any which remain. We may monitor or access any or all e-mails sent to us. [image: Powered by Sigstr] <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0/watermark>