[
https://issues.apache.org/jira/browse/CASSANDRA-21026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-21026:
----------------------------------------
Description:
If the address of a decommissioned node is re-used by another new node at some
later time, any node in the cluster with Accord enabled will be unable to start
up, including the new node.
As the new node comes up and registers with the {{ClusterMetadataService}} it
is added to the {{Directory}}.
The decommissioned node's details are also preserved in the directory present
to ensure that transactions which were in-flight can be completed after the
node has left.
(https://issues.apache.org/jira/browse/CASSANDRA-20142)
During AccordService initialization building the endpoint mapping will fail
because of this check in {{EndpointMapping.Builder}}:
{code}
Invariants.requireArgument(!mapping.containsValue(endpoint), "Mapping already
exists for %s", endpoint);
{code}
Additionally, it seems possible that the wrong method is being called in
{{AccordTopology::directoryToEndpointMapping}}
{code}
// There are cases where nodes are removed from the cluster (host
replacement, decom, etc.), but inflight events
// may still be happening; keep the ids around so pending events do not
fail with a mapping error
for (Directory.RemovedNode removedNode : directory.removedNodes())
builder.add(removedNode.endpoint, tcmIdToAccord(removedNode.id));
{code}
which should probably call {{builder::removed}} rather than {{builder::add}}
but that also contains the the same invariant check.
> Reusing the address of a removed node is not possible with Accord enabled
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-21026
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21026
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Accord, Cluster/Membership
> Reporter: Sam Tunnicliffe
> Priority: Normal
>
> If the address of a decommissioned node is re-used by another new node at
> some later time, any node in the cluster with Accord enabled will be unable
> to start up, including the new node.
> As the new node comes up and registers with the {{ClusterMetadataService}} it
> is added to the {{Directory}}.
> The decommissioned node's details are also preserved in the directory present
> to ensure that transactions which were in-flight can be completed after the
> node has left.
> (https://issues.apache.org/jira/browse/CASSANDRA-20142)
> During AccordService initialization building the endpoint mapping will fail
> because of this check in {{EndpointMapping.Builder}}:
> {code}
> Invariants.requireArgument(!mapping.containsValue(endpoint), "Mapping already
> exists for %s", endpoint);
> {code}
> Additionally, it seems possible that the wrong method is being called in
> {{AccordTopology::directoryToEndpointMapping}}
> {code}
> // There are cases where nodes are removed from the cluster (host
> replacement, decom, etc.), but inflight events
> // may still be happening; keep the ids around so pending events do
> not fail with a mapping error
> for (Directory.RemovedNode removedNode : directory.removedNodes())
> builder.add(removedNode.endpoint, tcmIdToAccord(removedNode.id));
> {code}
> which should probably call {{builder::removed}} rather than {{builder::add}}
> but that also contains the the same invariant check.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]