The default member-timeout is 5 seconds. For an unpredictable network or a system with GC pauses we might want to use a longer member-timeout in deployment. Network-partition-detection isn't involved in that though - it's just normal failure detection.

Where network-partition-detection would cause harm is in a small deployment: say 2 servers & 1 locator. If the "lead" server is kicked out this would cause both the locator and other server to shut-down because the membership weight was 28 and 15 of that was lost. They would all restart after a default delay of 1 minute using the auto-reconnect feature, which is enabled by default.


Le 1/8/2016 8:13 AM, Real Wes Williams a écrit :
What’s the level of concern here about members getting kicked out prematurely 
depending on the newly proposed default settings?  For instance, if the default 
suspect notification is 3 seconds and they are running in AWS or a mildly 
unpredictable network environment, a member could be kicked out.  What would be 
considered “safe” settings?

On Jan 7, 2016, at 4:18 PM, Bruce Schuchardt <[email protected]> wrote:

Another thing that's been discussed for a long time is turning on 
network-partition-detection by default.   It is a major problem for someone if 
a partition occurs and they are using persistence.  The disk-stores on all but 
one of the partitions have to be deleted and revoked.

Reply via email to