Dear Colin,

Thanks for the suggestions.

> For example, if a new node joins the cluster, it will have 0 failed connect 
> attempts, whereas the existing nodes will probably have more than 0.  So all 
> the clients will ignore every other node and pile on to the new one.  That's 
> not good


The existing behavior is not random when there’s no connected or connected 
node. leastLoadeNode() will always provide the node respect the connection 
backoff with the largest array index in the cached node list. The shuffle only 
happens after metadata fetch. Thus, when the client is not able to fetch 
metadata, the cached node won’t get shuffled. So I proposed to consider the 
failed attempts together with the connection backoff. 

The potential issue you mentioned make sense. I can think about an alternative 
way which is to randomly pick a disconnected node which respect the connection 
backoff.

> Consider the case where we need to talk to the controller but it is not 
> responding.  With the current proposal we will keep trying to reconnect every 
> 10 seconds.  That could lead to more reconnection attempts than what happens 
> today.  In the rare case where the node is taking more than 10 seconds to 
> process new connections, it will prevent us from connecting completely.

Exponential timeout make sense. I also have some thoughts about the parameter 
tuning. Since Java NIO will timeout and retry the socket channel connection 
exponentially after 1s, 2s, 4s, 8s, …, we’d better to make the default value as 
a exp of 2 since the sum of the timeout by Java NIO is 2^x  - 1. 

For example, if the socket.connection.setup.timeout = 10, Java NIO will only 
get a chance to try a maximum timeout 4 since 1 + 2 + 4 = 7 and the last try is 
less than 3s, which is useless. However, if we set the 
socket.connection.setup.timeout = 8 or 16, the last try won’t get wasted since 
1 + 2 + 4 = 7 and 1 + 2 + 4 + 8 = 15.


Please let me know what you think. Thanks.

Best, - Cheng Tan



> On May 18, 2020, at 1:32 PM, Colin McCabe <cmcc...@apache.org> wrote:
> 
> Hi Cheng,
> 
> socket.connection.setup.timeout.ms seems more consistent with our existing 
> configuration names than socket.connections.setup.timeout.ms (with an s).  
> What do you think?
> 
>> If no connected or connecting node exists, provide the disconnected node 
>> which
>> respects the reconnect backoff with the least number of failed attempts.
> 
> I think we need to rethink this part.  For example, if a new node joins the 
> cluster, it will have 0 failed connect attempts, whereas the existing nodes 
> will probably have more than 0.  So all the clients will ignore every other 
> node and pile on to the new one.  That's not good.  I think we should just 
> keep the existing random behavior.  If the node isn't blacklisted due to 
> connection backoff, it should be fair game to be connected to.
> 
> On a related note, I think it would be good to have an exponential connection 
> setup timeout backoff, similar to what we do with reconnect backoff.
> 
> Consider the case where we need to talk to the controller but it is not 
> responding.  With the current proposal we will keep trying to reconnect every 
> 10 seconds.  That could lead to more reconnection attempts than what happens 
> today.  In the rare case where the node is taking more than 10 seconds to 
> process new connections, it will prevent us from connecting completely.
> 
> An exponential strategy could start at 10 seconds, then do 20, then 40, then 
> 80, up to some limit.  That would reduce the extra load and also handle the 
> (hopefully very rare) case where connections are taking a long time to 
> connect.
> 
> best,
> Colin
> 

Reply via email to