The default member-timeout is 5 seconds. For an unpredictable network
or a system with GC pauses we might want to use a longer member-timeout
in deployment. Network-partition-detection isn't involved in that
though - it's just normal failure detection.
Where network-partition-detection would cause harm is in a small
deployment: say 2 servers & 1 locator. If the "lead" server is kicked
out this would cause both the locator and other server to shut-down
because the membership weight was 28 and 15 of that was lost. They
would all restart after a default delay of 1 minute using the
auto-reconnect feature, which is enabled by default.
Le 1/8/2016 8:13 AM, Real Wes Williams a écrit :
What’s the level of concern here about members getting kicked out prematurely
depending on the newly proposed default settings? For instance, if the default
suspect notification is 3 seconds and they are running in AWS or a mildly
unpredictable network environment, a member could be kicked out. What would be
considered “safe” settings?
On Jan 7, 2016, at 4:18 PM, Bruce Schuchardt <[email protected]> wrote:
Another thing that's been discussed for a long time is turning on
network-partition-detection by default. It is a major problem for someone if
a partition occurs and they are using persistence. The disk-stores on all but
one of the partitions have to be deleted and revoked.