Dear Colin,
Thanks for the suggestions. > For example, if a new node joins the cluster, it will have 0 failed connect > attempts, whereas the existing nodes will probably have more than 0. So all > the clients will ignore every other node and pile on to the new one. That's > not good The existing behavior is not random when there’s no connected or connected node. leastLoadeNode() will always provide the node respect the connection backoff with the largest array index in the cached node list. The shuffle only happens after metadata fetch. Thus, when the client is not able to fetch metadata, the cached node won’t get shuffled. So I proposed to consider the failed attempts together with the connection backoff. The potential issue you mentioned make sense. I can think about an alternative way which is to randomly pick a disconnected node which respect the connection backoff. > Consider the case where we need to talk to the controller but it is not > responding. With the current proposal we will keep trying to reconnect every > 10 seconds. That could lead to more reconnection attempts than what happens > today. In the rare case where the node is taking more than 10 seconds to > process new connections, it will prevent us from connecting completely. Exponential timeout make sense. I also have some thoughts about the parameter tuning. Since Java NIO will timeout and retry the socket channel connection exponentially after 1s, 2s, 4s, 8s, …, we’d better to make the default value as a exp of 2 since the sum of the timeout by Java NIO is 2^x - 1. For example, if the socket.connection.setup.timeout = 10, Java NIO will only get a chance to try a maximum timeout 4 since 1 + 2 + 4 = 7 and the last try is less than 3s, which is useless. However, if we set the socket.connection.setup.timeout = 8 or 16, the last try won’t get wasted since 1 + 2 + 4 = 7 and 1 + 2 + 4 + 8 = 15. Please let me know what you think. Thanks. Best, - Cheng Tan > On May 18, 2020, at 1:32 PM, Colin McCabe <cmcc...@apache.org> wrote: > > Hi Cheng, > > socket.connection.setup.timeout.ms seems more consistent with our existing > configuration names than socket.connections.setup.timeout.ms (with an s). > What do you think? > >> If no connected or connecting node exists, provide the disconnected node >> which >> respects the reconnect backoff with the least number of failed attempts. > > I think we need to rethink this part. For example, if a new node joins the > cluster, it will have 0 failed connect attempts, whereas the existing nodes > will probably have more than 0. So all the clients will ignore every other > node and pile on to the new one. That's not good. I think we should just > keep the existing random behavior. If the node isn't blacklisted due to > connection backoff, it should be fair game to be connected to. > > On a related note, I think it would be good to have an exponential connection > setup timeout backoff, similar to what we do with reconnect backoff. > > Consider the case where we need to talk to the controller but it is not > responding. With the current proposal we will keep trying to reconnect every > 10 seconds. That could lead to more reconnection attempts than what happens > today. In the rare case where the node is taking more than 10 seconds to > process new connections, it will prevent us from connecting completely. > > An exponential strategy could start at 10 seconds, then do 20, then 40, then > 80, up to some limit. That would reduce the extra load and also handle the > (hopefully very rare) case where connections are taking a long time to > connect. > > best, > Colin >