[
https://issues.apache.org/jira/browse/CASSANDRA-20910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043069#comment-18043069
]
Berenguer Blasi commented on CASSANDRA-20910:
---------------------------------------------
Hi [~chrisjmiller],
I don't have a repro and the time it happened I learned there had been some
networked storage pulg-in/out that might or might have not messed peer tables,
mixed configs, etc All unintentional. So iiuc if you move things under C*'s
feet enough there's room for that to happen. I also managed to repro locally
once though I haven't been able to do it again.
In any case there are code paths that could benefit from better checks and log
the error. With the added logging we'll have additional info to pin this down
eventually. My patch will focus on the added logging and rejecting foreign
nodes.
> Instances from a 2nd ring join another ring when running on the same nodes
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-20910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20910
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: Chris Miller
> Assignee: Berenguer Blasi
> Priority: Urgent
> Fix For: 4.1.x, 5.0.x, 6.x
>
>
> Hi,
> We experienced an issue today whereby instances from a 2nd ring join another
> ring when running on the same nodes following a rolling restart which took
> place following an OS patch and node reboot (both on Cassandra 4.1.2).
> The cluster names and storage ports are different and this type of activity
> normally runs without issue.
> Any ideas as to what could have happened? Could this be a bug?
> The seeds use the same IP addresses but no storage port is configured in the
> seeds parameter, should we add the storage port to prevent this from
> happening again? Any thoughts?
> Messages like the following could be seen on ring 1.
> INFO [GossipStage:1] 2025-09-18 04:11:49,040 Gossiper.java:1434 - Node
> /XX.XX.XX.190:7002 is now part of the cluster
> INFO [GossipStage:1] 2025-09-18 04:11:49,043 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.190:7002
> INFO [Messaging-EventLoop-3-8] 2025-09-18 04:11:49,044
> OutboundConnection.java:1153 -
> /XX.XX.XX.61:7000(/XX.XX.XX.61:41920)->/XX.XX.XX.190:7002-URGENT_MESSAGES-7af53583
> successfully connected, version = 12, framing = CRC, encryption = unencrypted
> INFO [GossipStage:1] 2025-09-18 04:11:49,044 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.190:7002
> INFO [GossipStage:1] 2025-09-18 04:11:49,044 Gossiper.java:1434 - Node
> /XX.XX.XX.214:7002 is now part of the cluster
> INFO [Messaging-EventLoop-3-3] 2025-09-18 04:11:49,046
> OutboundConnection.java:1153 -
> /XX.XX.XX.61:7000(/XX.XX.XX.61:62628)->/XX.XX.XX.214:7002-URGENT_MESSAGES-0515b24a
> successfully connected, version = 12, framing = CRC, encryption = unencrypted
> INFO [GossipStage:1] 2025-09-18 04:11:49,046 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.214:7002
> INFO [GossipStage:1] 2025-09-18 04:11:49,046 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.214:7002
> INFO [GossipStage:1] 2025-09-18 04:11:49,047 Gossiper.java:1434 - Node
> /XX.XX.XX.247:7002 is now part of the cluster
> INFO [Messaging-EventLoop-3-4] 2025-09-18 04:11:49,048
> InboundConnectionInitiator.java:529 -
> /XX.XX.XX.190:7002(/XX.XX.XX.190:60180)->/XX.XX.XX.61:7000-URGENT_MESSAGES-edfb2d8f
> messaging connection established, version = 12, framing = LZ4, encryption =
> unencrypted
> Messages like the following in ring 2:
> WARN [GossipStage:1] 2025-09-18 04:11:49,304
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.247:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:49,819
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.108:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:51,598
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.190:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:52,361
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.111:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:53,489
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.84:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:58,322
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.247:7000 ring1!=ring2
> Instances from ring2 were listed in nodetool describecluster as unreachable
> under schema versions.
> They were also listed as DN under nodetool status.
> The nodetool removenode command was used to remove the instances successfully.
> Regards,
> Chris.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]