This relates to the issue i opened the other day: https://issues.apache.org/jira/browse/CASSANDRA-3175 .. basically, 'nodetool ring' throws an exception on two of the four nodes.
In my fancy little world, the problems appear to be related to one of the nodes thinking that someone is their neighbor ... and that someone moved away a long time ago............ /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed. Appears only in the logs for one node that is generating the issue. 172.16.12.10 Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from? On the two nodes that work: [default@system] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] [default@system] >From the two nodes that don't work: [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] UNREACHABLE: [10.130.185.136] --> which is really 172.16.14.10 [default@unknown] Really now. Where does 10.130.185.136 exist? It's in none of the configurations I have AND the full ring has been shut down and started up ... not trying to give Vijay a hard time by posting here btw! Just thinking it could be something super silly ... that a wider audience has come across. -- Sasha Dolgy sasha.do...@gmail.com