Hi Rajini. 

When kafka node' machine is shutdown or network is closed, the connecting phase 
could not use the request.timeout.ms, because the client haven't send a req 
yet.   And no response for the nio, the selector will not close the connect, so 
it will not choose other good node to get the metadata.

 
Thanks
David

------------------ 原始邮件 ------------------
发件人: "Rajini Sivaram" <rajinisiva...@gmail.com>;
发送时间: 2017年5月22日(星期一) 20:17
收件人: "dev" <dev@kafka.apache.org>;
主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client



Hi David,

Is there a reason why you wouldn't want to use request.timeout.ms as the
timeout parameter for connections? Then you would use the same timeout for
connected and connecting phases when shutdown is unclean. You could still
use the timeout to ensure that next metadata request is sent to another
node.

Regards,

Rajini

On Sun, May 21, 2017 at 9:51 AM, 东方甲乙 <254479...@qq.com> wrote:

> Hi Guozhang,
>
>
> Thanks for the clarify. For the clarify 2, I think the key thing is not
> users control how much time in maximum to wait for inside code, but is the
> network client can be aware of the connecting can't be finished and try a
> good node. In the producer.sender even the selector.poll can timeout, but
> the next time is also not close the previous connecting and try another
> good node.
>
>
> In out test env, QA shutdown one of the leader node, the producer send the
> request will timeout and close the node's connection then request the
> metadata.  But sometimes the request node is also the shutdown node.  When
> connecting the shutting down node to get the metadata, it is in the
> connecting phase, network client mark the connecting node's state to
> CONNECTING, but if the node is shutdown,  the socket can't be aware of the
> connecting is broken. Though the selector.poll has timeout parameter, but
> it will not close the connection, so the next
> time in the "networkclient.maybeUpdate" it will check if
> isAnyNodeConnecting, then will not connect to any good node the get the
> metadata.  It need about several minutes to
> aware the connecting is timeout and try other node.
>
>
> So I want to add a connect.timeout parameter,  the selector can find the
> connecting is timeout and close the connection.  It seems the currently the
> timeout value passed in `selector.poll()`
> seems can not do this.
>
>
> Thanks,
> David
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Guozhang Wang";<wangg...@gmail.com>;
> 发送时间: 2017年5月16日(星期二) 凌晨1:51
> 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>;
>
> 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
>
>
>
> Hi David,
>
> I may be a bit confused before, just clarifying a few things:
>
> 1. As you mentioned, a client will always try to first establish the
> connection with a broker node before it tries to send any request to it.
> And after connection is established, it will either continuously send many
> requests (e.g. produce) for just a single request (e.g. metadata) to the
> broker, so these two phases are indeed different.
>
> 2. In the connected phase, connections.max.idle.ms is used to
> auto-disconnect the socket if no requests has been sent / received during
> that period of time; in the connecting phase, we always try to create the
> socket via "socketChannel.connect" in a non-blocking call, and then checks
> if the connection has been established, but all the callers of this
> function (in either producer or consumer) has a timeout parameter as in
> `selector.poll()`, and the timeout parameter is set either by calculations
> based on metadata.expiration.time and backoff for producer#sender, or by
> directly passed values from consumer#poll(timeout), so although there is no
> directly config controlling that, users can still control how much time in
> maximum to wait for inside code.
>
> I originally thought your scenarios is more on the connected phase, but now
> I feel you are talking about the connecting phase. For that case, I still
> feel currently the timeout value passed in `selector.poll()` which is
> controllable from user code should be sufficient?
>
>
> Guozhang
>
>
>
>
> On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <254479...@qq.com> wrote:
>
> > Hi Guozhang,
> >
> >
> > Sorry for the delay, thanks for the question.  It seems two different
> > parameters to me:
> > connect.timeout.ms: only work for the connecting phrase, after connected
> > phrase this parameter is not used.
> > connections.max.idle.ms: currently not work in the connecting phrase
> > (only select return readyKeys >0) will add to the expired manager, after
> > connected will check if the connection is still alive in some time.
> >
> >
> > Even if we change the connections.max.idle.ms to work including the
> > connecting phrase, we can not set this parameter to a small value, such
> as
> > 5 seconds. Because the client is maybe busy sending message to other
> node,
> > it will be disconnected in 5 seconds, so the default value of
> > connections.max.idle.ms is setting to a larger time. We should have two
> > parameters to control the connecting phrase behavior and the connected
> > phrase behavior, do you think so?
> >
> >
> > Thanks,
> >
> >
> > David
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Guozhang Wang";<wangg...@gmail.com>;
> > 发送时间: 2017年5月6日(星期六) 上午7:52
> > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>;
> >
> > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> >
> >
> >
> > Hello David,
> >
> > Thanks for the KIP. For the described issue, I'm wondering if it can be
> > resolved by tuning the CONNECTIONS_MAX_IDLE_MS_CONFIG (
> > connections.max.idle.ms) on the client side? Default is 9 minutes.
> >
> >
> > Guozhang
> >
> > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <254479...@qq.com> wrote:
> >
> > > Hi all,
> > >
> > > Currently in our test environment, we found that after one of the
> broker
> > > node crash (reboot or os crash), the client may still be connecting to
> > the
> > > crash node to send metadata request or other request, and it needs
> > several
> > > minutes to be aware that the connection is timeout then try another
> node
> > to
> > > connect to send the request. Then the client may still not be aware of
> > the
> > > metadata change after several minutes.
> > >
> > >
> > > So I want to add a connect timeout on the  client,  please take a look
> > at:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 148%3A+Add+a+connect+timeout+for+client
> > >
> > > Regards,
> > >
> > > David
> >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to