Dear Colin,

Thanks for the reply. Your reasoning make sense. I’ve modified the KIP-601 with 
two changes:

1. Now KIP-601 is proposing a exponential connection setup timeout, which is 
controlled by socket.connections.setup.timeout.ms (init value) and 
socket.connections.setup.timeout.max.ms (max value)

2. The logic optimization in leastLoadedNode(), which I want to discuss on that 
again. In the scenario that no connected or connection node exists, instead of 
providing the node with least failed attempts, the NetworkClient can provide 
the least recently used node which respects the reconnect backoff. The existing 
property ClusterConnectionStates.NodeConnectionState.lastConnectAttemptMs can 
help us pick the LRU node conveniently. Does this make sense to you?

Please let me know what you think. Thanks.


Best, - Cheng Tan



> On May 19, 2020, at 1:44 PM, Colin McCabe <cmcc...@apache.org> wrote:
> 
> It seems like this analysis is assuming that the only reason to wait longer 
> is so that we can send another SYN packet.  This may not be the case-- 
> perhaps waiting longer would allow us to receive an ACK from the remote end 
> that has been delayed for some reason while going through the network.
> 
> We also probably don't want our expiration time period to line up exactly 
> with Linux's retries.  If it did, we would cut off the connection attempt 
> just as we were re-sending another SYN.
> 
> Also, there are other OSes besides Linux, and other configurations besides 
> the default one.
> 
> So, on the whole, I don't think we need to make the default a power of 2.
> 
> best,
> Colin
> 

Reply via email to