Dear Colin,
Thanks for the reply. Your reasoning make sense. I’ve modified the KIP-601 with two changes: 1. Now KIP-601 is proposing a exponential connection setup timeout, which is controlled by socket.connections.setup.timeout.ms (init value) and socket.connections.setup.timeout.max.ms (max value) 2. The logic optimization in leastLoadedNode(), which I want to discuss on that again. In the scenario that no connected or connection node exists, instead of providing the node with least failed attempts, the NetworkClient can provide the least recently used node which respects the reconnect backoff. The existing property ClusterConnectionStates.NodeConnectionState.lastConnectAttemptMs can help us pick the LRU node conveniently. Does this make sense to you? Please let me know what you think. Thanks. Best, - Cheng Tan > On May 19, 2020, at 1:44 PM, Colin McCabe <cmcc...@apache.org> wrote: > > It seems like this analysis is assuming that the only reason to wait longer > is so that we can send another SYN packet. This may not be the case-- > perhaps waiting longer would allow us to receive an ACK from the remote end > that has been delayed for some reason while going through the network. > > We also probably don't want our expiration time period to line up exactly > with Linux's retries. If it did, we would cut off the connection attempt > just as we were re-sending another SYN. > > Also, there are other OSes besides Linux, and other configurations besides > the default one. > > So, on the whole, I don't think we need to make the default a power of 2. > > best, > Colin >