[jira] [Commented] (CASSANDRA-20476) Cluster is unable to recover after shutdown if IPs change

Marcus Eriksson (Jira) Fri, 28 Mar 2025 00:55:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939164#comment-17939164
 ]


Marcus Eriksson commented on CASSANDRA-20476:
---------------------------------------------

The reason for this is that on startup we now need to commit the IP change 
transformation ({{Startup}}), but since all ips have changed we don't know 
where to commit it. There should of course be a way to recover from this and I 
think it could be done today by dumping the cluster metadata, offline-rewriting 
the placements and using 
{{-Dcassandra.unsafe_boot_with_clustermetadata=<file>}}, but that is quite 
tricky.

A quick fix would be to add a startup parameter, something like 
{{-Dcassandra.unsafe.resetcms=<host>}} which would make <host> the cms and let 
replicas in the cluster commit their IP change transformations. Longer term we 
can probably add functionality to discover the new CMS via the seeds. I'll try 
to get a patch up for the startup parameter soon.

> Cluster is unable to recover after shutdown if IPs change
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-20476
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20476
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: Michael Burman
>            Priority: Normal
>             Fix For: 5.x
>
>
> When a cluster is for any reason shutdown in a environment where the IPs can 
> change, the current TCM implementation prevents the cluster from recovering. 
> The previous Gossip system was able to correctly restart after this process, 
> but the first node when starting with TCM will get stuck to trying to find 
> nodes that do not exists anymore and it will prevent starting entirely.
> What happens is that it spams the following to the logs:
> {noformat}
> WARN  [InternalResponseStage:218] 2025-03-24 12:31:53,433 
> RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when 
> sending TCM_COMMIT_REQ, retrying on 
> CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
> WARN  [InternalResponseStage:219] 2025-03-24 12:32:03,496 
> RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when 
> sending TCM_COMMIT_REQ, retrying on 
> CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
> WARN  [Messaging-EventLoop-3-3] 2025-03-24 12:32:13,528 NoSpamLogger.java:107 
> - /10.244.4.8:7000->/10.244.3.4:7000-URGENT_MESSAGES-[no-channel] dropping 
> message of type TCM_COMMIT_REQ whose timeout expired before reaching the 
> network
> WARN  [InternalResponseStage:220] 2025-03-24 12:32:13,529 
> RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when 
> sending TCM_COMMIT_REQ, retrying on 
> CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
> INFO  [Messaging-EventLoop-3-6] 2025-03-24 12:32:23,373 NoSpamLogger.java:104 
> - /10.244.4.8:7000->/10.244.6.7:7000-URGENT_MESSAGES-[no-channel] failed to 
> connect
> io.netty.channel.ConnectTimeoutException: connection timed out after 2000 ms: 
> /10.244.6.7:7000
>         at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
>         at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>         at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:156)
>         at 
> io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
>         at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>         at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>         at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> And does not move forward. The node is assigned as its own seed node with the 
> current IP address, which is 10.244.4.8 in this case. 
> {noformat}
> INFO  [main] 2025-03-24 11:55:16,938 InboundConnectionInitiator.java:165 - 
> Listening on address: (/10.244.4.8:7000), nic: eth0, encryption: unencrypted
> {noformat}
> However as seen from nodetool, it has no idea of such:
> {noformat}
> [cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool status
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load  Tokens  Owns (effective)  Host ID                       
>         Rack
> DN  10.244.4.7  ?     16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  r1  
> DN  10.244.6.7  ?     16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  r2  
> DN  10.244.3.4  ?     16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  r3  
> [cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool cms
> Cluster Metadata Service:
> Members: /10.244.3.4:7000
> Needs reconfiguration: false
> Is Member: false
> Service State: REMOTE
> Is Migrating: false
> Epoch: 24
> Local Pending Count: 0
> Commits Paused: false
> Replication factor: 
> ReplicationParams{class=org.apache.cassandra.locator.MetaStrategy, dc1=1}
> [cassandra@cluster1-dc1-r1-sts-0 /]$
> {noformat}
> It will also not start listening on port 9042. It will forever wait others, 
> not understanding its own IP address has changed. Since this happens to all 
> nodes, the entire cluster is basically dead. 
> In this configuration I used 3 racks, 3 nodes system and simply stopped the 
> cluster in Kubernetes. initial_location_provider was 
> RackDCFileLocationProvider and node_proximity: NetworkTopologyProximity as 
> these should function like GossipingPropertyFileSnitch (according to the 
> documentation). This functionality works fine in older gossiping 
> implementations. 
> The IPs on Kubernetes deployment change everytime the pod is deleted, so 
> assuming any sort of static IPs is not going to work and would be serious 
> downgrade from 5.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20476) Cluster is unable to recover after shutdown if IPs change

Reply via email to