Michael Burman created CASSANDRA-20476:
------------------------------------------

             Summary: Cluster is unable to recover after shutdown if IPs change
                 Key: CASSANDRA-20476
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20476
             Project: Apache Cassandra
          Issue Type: Bug
          Components: Transactional Cluster Metadata
            Reporter: Michael Burman


When a cluster is for any reason shutdown in a environment where the IPs can 
change, the current TCM implementation prevents the cluster from recovering. 
The previous Gossip system was able to correctly restart after this process, 
but the first node when starting with TCM will get stuck to trying to find 
nodes that do not exists anymore and it will prevent starting entirely.

What happens is that it spams the following to the logs:


{noformat}
WARN  [InternalResponseStage:218] 2025-03-24 12:31:53,433 
RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when 
sending TCM_COMMIT_REQ, retrying on 
CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
WARN  [InternalResponseStage:219] 2025-03-24 12:32:03,496 
RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when 
sending TCM_COMMIT_REQ, retrying on 
CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
WARN  [Messaging-EventLoop-3-3] 2025-03-24 12:32:13,528 NoSpamLogger.java:107 - 
/10.244.4.8:7000->/10.244.3.4:7000-URGENT_MESSAGES-[no-channel] dropping 
message of type TCM_COMMIT_REQ whose timeout expired before reaching the network
WARN  [InternalResponseStage:220] 2025-03-24 12:32:13,529 
RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when 
sending TCM_COMMIT_REQ, retrying on 
CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
INFO  [Messaging-EventLoop-3-6] 2025-03-24 12:32:23,373 NoSpamLogger.java:104 - 
/10.244.4.8:7000->/10.244.6.7:7000-URGENT_MESSAGES-[no-channel] failed to 
connect
io.netty.channel.ConnectTimeoutException: connection timed out after 2000 ms: 
/10.244.6.7:7000
        at 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:156)
        at 
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}

And does not move forward. The node is assigned as its own seed node with the 
current IP address, which is 10.244.4.8 in this case. 

{noformat}
INFO  [main] 2025-03-24 11:55:16,938 InboundConnectionInitiator.java:165 - 
Listening on address: (/10.244.4.8:7000), nic: eth0, encryption: unencrypted
{noformat}

However as seen from nodetool, it has no idea of such:

{noformat}
[cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load  Tokens  Owns (effective)  Host ID                         
      Rack
DN  10.244.4.7  ?     16      64.7%             
6d194555-f6eb-41d0-c000-000000000001  r1  
DN  10.244.6.7  ?     16      59.3%             
6d194555-f6eb-41d0-c000-000000000002  r2  
DN  10.244.3.4  ?     16      76.0%             
6d194555-f6eb-41d0-c000-000000000003  r3  

[cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool cms
Cluster Metadata Service:
Members: /10.244.3.4:7000
Needs reconfiguration: false
Is Member: false
Service State: REMOTE
Is Migrating: false
Epoch: 24
Local Pending Count: 0
Commits Paused: false
Replication factor: 
ReplicationParams{class=org.apache.cassandra.locator.MetaStrategy, dc1=1}
[cassandra@cluster1-dc1-r1-sts-0 /]$
{noformat}

It will also not start listening on port 9042. It will forever wait others, not 
understanding its own IP address has changed. Since this happens to all nodes, 
the entire cluster is basically dead. 

In this configuration I used 3 racks, 3 nodes system and simply stopped the 
cluster in Kubernetes. initial_location_provider was RackDCFileLocationProvider 
and node_proximity: NetworkTopologyProximity as these should function like 
GossipingPropertyFileSnitch (according to the documentation). This 
functionality works fine in older gossiping implementations. 

The IPs on Kubernetes deployment change everytime the pod is deleted, so 
assuming any sort of static IPs is not going to work and would be serious 
downgrade from 5.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to