Hi Cheng,

Thanks for the updates, looks good.

+1 (binding)

Regards,

Rajini

On Wed, Jun 3, 2020 at 8:53 AM Cheng Tan <c...@confluent.io> wrote:

> Dear Rajini,
>
> Thanks for the feedback.
>
> 1)
> Because "request.timeout.ms" only affects in-flight requests, after the
> API NetworkClient.ready() is invoked, the connection won't get closed after
> "request.timeout.ms” hits. Before
>         a) the SocketChannel is connected
>         b) ssl handshake finished
>         c) authentication has finished (sasl)
> clients cannot invoke NetworkClient.send() to send any request, which
> means no in-flight request targeting to the connection will be added.
>
>
> 2)
> I think a default value of 127 seconds make sense, which meets the timeout
> indirectly specified by the default value of “tcp.syn.retries”. I’ve added
> this into the KIP proposal.
>
>
> 3)
> Every time the timeout hits, the timeout value of the next connection try
> will increase.
>
> The timeout will hit iff a connection stays at the `connecting` state
> longer than the timeout value, as indicated by
> ClusterConnectionStates.NodeConnectionState. The connection state of a node
> will change iff `SelectionKey.OP_CONNECT` is detected by
> `nioSelector.Select()`. The connection state may transit from `connecting`
> to
>
>         a) `disconnected` when SocketChannel.finishConnect() throws
> IOException.
>         b) `connected` when SocketChannel.finishConnect() return TRUE.
>
> In other words, the timeout will hit and increase iff the interested
> SelectionKey.OP_CONNECT doesn't trigger before the timeout arrives, which
> means, for example, network congestion, failure of the ARP request, packet
> filtering, routing error, or a silent discard may happen. (I didn’t read
> the Java NIO source code. Please correct me the case when OP_CONNECT won’t
> get triggered if I’m wrong)
>
>
> 4)
>
> A) Connection timeout dominates both request timeout and API timeout
>
> When connection timeout hits, the connection will be closed. The client
> will be notified either by the responses constructed by NetworkClient or
> the callbacks attached to the request. As a result, the request failure
> will be handled before either connection timeout or API timeout arrives.
>
>
> B) Neither request timeout or API timeout dominates connection timeout
>
> i) Request timeout: Because request timeout only affects in-flight
> requests, after the API NetworkClient.ready() is invoked, the connection
> won't get closed after "request.timeout.ms” hits. Before
>         1. the SocketChannel is connected
>         2. SSL handshake finished
>         3. authentication has finished (SASL)
> , clients won't be able to invoke NetworkClient.send() to send any
> request, which means no in-flight request targeting to the connection will
> be added.
>
> ii) API timeout: In AdminClient, API timeout acts by putting a smaller and
> smaller timeout value to the chain of requests in a same API. After the API
> timeout hits, the retry logic won't close any connection. In consumer, API
> timeout acts as a whole by putting a limit to the code block executing
> time. The retry logic won't close any connection as well.
>
>
> Conclusion:
>
> Thanks again for the long feedback and I’m always enjoying them. I’ve
> supplement the above discussion into the KIP proposal. Please let me know
> what you think.
>
>
> Best, - Cheng Tan
>
>
> > On Jun 2, 2020, at 3:01 AM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
> >
> > Hi Cheng,
> >
> > Not sure if the discussion should move back to the DISCUSS thread. I
> have a
> > few questions:
> >
> > 1) The KIP motivation says that in some cases `request.timeout.ms`
> doesn't
> > timeout connections properly and as a result it takes 127s to detect a
> > connection failure. This sounds like a bug rather than a limitation of
> the
> > current approach. Can you explain the scenarios where this occurs?
> >
> > 2) I think the current proposal is to use non-exponential 10s connection
> > timeout as default with the option to use exponential timeout. So
> > connection timeouts for every connection attempt will be between 8s and
> 12s
> > by default. Is that correct? Should we use a default max timeout to
> enable
> > exponential timeout by default since 8s seems rather small?
> >
> > 3) What is the scope of `failures` used to determine connection timeout
> > with exponential timeouts? Will we always use 10s followed by 20s every
> > time a connection is attempted?
> >
> > 4) It will be good if we can include two flows with the relationship
> > between various timeouts in the KIP. One with a fixed node like a typical
> > produce/consume request to the leader and another that uses
> > `leastLoadedNode` like a metadata request. Having the comparison between
> > the current and proposed behaviour w.r.t all configurable timeouts (the
> two
> > new connection timeouts, request timeout, api timeout etc.) will be
> useful.
> >
> > Regards,
> >
> > Rajini
> >
>

Reply via email to