[ https://issues.apache.org/jira/browse/IGNITE-22377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy updated IGNITE-22377: --------------------------------------- Summary: Choose node to fail on a refused handshake (was: Choose node to fail on a failed handshake) > Choose node to fail on a refused handshake > ------------------------------------------ > > Key: IGNITE-22377 > URL: https://issues.apache.org/jira/browse/IGNITE-22377 > Project: Ignite > Issue Type: Improvement > Reporter: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > Fix For: 3.1 > > > Currently, if during a handshake a node gets refused because it's stale from > the point of view of the node to which it connects, the refused node notifies > its FailureHandler to force node restart. > If a network partition happens, this might cause problems when it disappears: > nodesĀ from different segments will start sniping each other. In the worst > case, a single segmented node might make the whole cluster (but itself) > restart if. > It is suggested that the refusing node sends the following information about > the physical topology as it sees it to the refused node: > # Number of nodes in the PT > # Min ID of nodes in the PT > The refused node will only restart if the number of nodes in the PT, as it > sees it, is less than the number of nodes in the PT of the refusing node; if > the sizes are equal, then comparing Min IDs of nodes in the PT will allow to > make a determenistic decision. > This idea needs to be thought through and improved (or rejected). -- This message was sent by Atlassian Jira (v8.20.10#820010)