Hi, We are trying to roll upgrade from 1.0.12 to 1.2.5, but we found that the 1.2.5 node cannot see other old nodes. Therefore, we tried to upgrade to 1.1.12 first, and it works. However, we still saw the same issue when rolling upgrade from 1.1.12 to 1.2.5. This seems to be the fixed issue as https://issues.apache.org/jira/browse/CASSANDRA-5332 but we still saw it in 1.2.5.
Enviroment: OS: CentOS 6 JDK: 6u31 cluster:3 nodes for testing, in EC2 Snitch: Ec2MultiRegionSnitch NetworkTopologyStrategy: strategy_options = { ap-southeast:3 } We have 3 nodes and we upgraded 122.248.xxx.xxx to 1.2.5 first, the other 2 nodes are still 1.1.12. When we restarted the upgraded node, it will see the other 2 old nodes as UP in the log. However, after a few seconds, these 2 nodes will be marked as DOWN. This is the ring info from 1.2.5 node - 122.248.xxx.xxx Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: ap-southeast ========== Address Rack Status State Load Owns Token 113427455640312821154458202477256070486 122.248.xxx.xxx 1b Up Normal 69.74 GB 33.33% 1 54.251.xxx.xxx 1b Down Normal 69.77 GB 33.33% 56713727820156410577229101238628035243 54.254.xxx.xxx 1b Down Normal 70.28 GB 33.33% 113427455640312821154458202477256070486 but Old 1.1.12 nodes can see new node: Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Status State Load Owns Token 113427455640312821154458202477256070486 122.248.xxx.xxx ap-southeast1b Up Normal 69.74 GB 33.33% 1 54.251.xxx.xxx ap-southeast1b Up Normal 69.77 GB 33.33% 56713727820156410577229101238628035243 54.254.xxx.xxx ap-southeast1b Up Normal 70.28 GB 33.33% 113427455640312821154458202477256070486 We enabled trace log level to check gossip related logs. The log below from 1.2.5 node shows that the other 2 nodes are UP in the beginning. They seem to complete SYN/ACK/ACK2 handshake cycle. TRACE [GossipStage:1] 2013-06-19 07:44:43,047 GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage from /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,047 GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are : /54.254.xxx.xxx:1371617084:10967 /54.251. xxx.xxx:1371625851:2055 TRACE [GossipStage:1] 2013-06-19 07:44:43,048 Gossiper.java (line 945) requestAll for /54.254.xxx.xxx ..... TRACE [GossipStage:1] 2013-06-19 07:44:43,080 GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage to /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,080 MessagingService.java (line 601) /122.248.216.142 sending GOSSIP_DIGEST_ACK to 19@/54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,080 GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage from /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,080 GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are : /54.254.xxx.xxx:1371617084:10978 /54.251. xxx.xxx:1371625851:2066 ..... TRACE [GossipStage:1] 2013-06-19 07:44:43,083 GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage to /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,083 MessagingService.java (line 601) /122.248.216.142 sending GOSSIP_DIGEST_ACK to 22@/54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,084 GossipDigestAck2VerbHandler.java (line 38) Received a GossipDigestAck2Message from /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,084 FailureDetector.java (line 168) reporting /54.254.xxx.xxx INFO [GossipStage:1] 2013-06-19 07:44:43,093 Gossiper.java (line 805) Node /54.254.xxx.xxx is now part of the cluster TRACE [GossipStage:1] 2013-06-19 07:44:43,094 Gossiper.java (line 808) Adding endpoint state for /54.254.xxx.xxx DEBUG [GossipStage:1] 2013-06-19 07:44:43,094 MessagingService.java (line 377) Resetting pool for /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,094 Gossiper.java (line 764) marking as alive /54.254.xxx.xxx DEBUG [GossipStage:1] 2013-06-19 07:44:43,095 Gossiper.java (line 770) removing expire time for endpoint : /54.254.xxx.xxx INFO [GossipStage:1] 2013-06-19 07:44:43,095 Gossiper.java (line 771) InetAddress /54.254.xxx.xxx is now UP After a few seconds, the log shows that the old nodes did not respond ACK2 and are marked as DOWN: TRACE [GossipStage:1] 2013-06-19 07:44:52,121 GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage from /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:52,123 GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are : /54.254.xxx.xxx:1371617084:10991 /54.251. xxx.xxx:1371625851:2078 TRACE [GossipStage:1] 2013-06-19 07:44:52,124 GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage to /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:52,124 MessagingService.java (line 601) /122.248.xxx.xxx sending GOSSIP_DIGEST_ACK to 52@/54.254.xxx.xxx TRACE [GossipTasks:1] 2013-06-19 07:44:52,358 Gossiper.java (line 116) My heartbeat is now 25 TRACE [GossipTasks:1] 2013-06-19 07:44:52,358 Gossiper.java (line 350) Gossip Digests are : /54.251.xxx.xxx:1371625851:2069 /122.248.xxx. xxx:1371627882:25 /54.254.xxx.xxx:1371617084:10980 TRACE [GossipTasks:1] 2013-06-19 07:44:52,358 Gossiper.java (line 505) Sending a GossipDigestSyn to /54.254.xxx.xxx ... TRACE [GossipTasks:1] 2013-06-19 07:44:52,358 MessagingService.java (line 601) /122.248.xxx.xxx sending GOSSIP_DIGEST_SYN to 54@/54.254.xxx.xxx TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 Gossiper.java (line 154) Performing status check ... TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 FailureDetector.java (line 189) PHI for /54.254.xxx.xxx : 8.05616263930532 TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 FailureDetector.java (line 193) notifying listeners that /54.254.xxx.xxx is down TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 FailureDetector.java (line 194) intervals: 500.0 mean: 500.0 TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 Gossiper.java (line 781) marking as down /54.254.xxx.xxx INFO [GossipTasks:1] 2013-06-19 07:44:52,360 Gossiper.java (line 785) InetAddress /54.254.xxx.xxx is now DOWN DEBUG [GossipTasks:1] 2013-06-19 07:44:52,360 MessagingService.java (line 377) Resetting pool for /54.254.xxx.xxx We check that old node try to send SYN to 1.2.5 node but does not receive any GossipDigestAckMessage from it. Any ideas how to solve this issue? Thank you!