Hi Cheng, Not sure if the discussion should move back to the DISCUSS thread. I have a few questions:
1) The KIP motivation says that in some cases `request.timeout.ms` doesn't timeout connections properly and as a result it takes 127s to detect a connection failure. This sounds like a bug rather than a limitation of the current approach. Can you explain the scenarios where this occurs? 2) I think the current proposal is to use non-exponential 10s connection timeout as default with the option to use exponential timeout. So connection timeouts for every connection attempt will be between 8s and 12s by default. Is that correct? Should we use a default max timeout to enable exponential timeout by default since 8s seems rather small? 3) What is the scope of `failures` used to determine connection timeout with exponential timeouts? Will we always use 10s followed by 20s every time a connection is attempted? 4) It will be good if we can include two flows with the relationship between various timeouts in the KIP. One with a fixed node like a typical produce/consume request to the leader and another that uses `leastLoadedNode` like a metadata request. Having the comparison between the current and proposed behaviour w.r.t all configurable timeouts (the two new connection timeouts, request timeout, api timeout etc.) will be useful. Regards, Rajini On Wed, May 20, 2020 at 4:51 AM Cheng Tan <c...@confluent.io> wrote: > Dear Colin, > > > Thanks for the reply. Your reasoning make sense. I’ve modified the KIP-601 > with two changes: > > 1. Now KIP-601 is proposing a exponential connection setup timeout, which > is controlled by socket.connections.setup.timeout.ms (init value) and > socket.connections.setup.timeout.max.ms (max value) > > 2. The logic optimization in leastLoadedNode(), which I want to discuss on > that again. In the scenario that no connected or connection node exists, > instead of providing the node with least failed attempts, the NetworkClient > can provide the least recently used node which respects the reconnect > backoff. The existing property > ClusterConnectionStates.NodeConnectionState.lastConnectAttemptMs can help > us pick the LRU node conveniently. Does this make sense to you? > > Please let me know what you think. Thanks. > > > Best, - Cheng Tan > > > > > On May 19, 2020, at 1:44 PM, Colin McCabe <cmcc...@apache.org> wrote: > > > > It seems like this analysis is assuming that the only reason to wait > longer is so that we can send another SYN packet. This may not be the > case-- perhaps waiting longer would allow us to receive an ACK from the > remote end that has been delayed for some reason while going through the > network. > > > > We also probably don't want our expiration time period to line up > exactly with Linux's retries. If it did, we would cut off the connection > attempt just as we were re-sending another SYN. > > > > Also, there are other OSes besides Linux, and other configurations > besides the default one. > > > > So, on the whole, I don't think we need to make the default a power of 2. > > > > best, > > Colin > > > >