It looks like the node is sending out it application state and waiting the required time after which it expects to know about all other nodes in the cluster.
> INFO [main] 2011-03-07 17:04:06,660 StorageService.java (line 399) Joining: > sleeping 30000 ms for pending range setup For some reason it cannot see them. This could be a config thing or a networking thing. I was a bit off in my analysis before. When boot strapping it's smart enough to wait for gossip to kick in and tell the node about the others in the cluster. Try the following: - check network connectivity between the problem node and the others, and check they have the same config - try to bring up the problem node with auto_bootstrap off . If it can get start check it's view of the cluster with nodetool ring - if that fails turn on TRACE logging on all nodes, and try to bring up the problem node. This will log a lot of messages about what Gossip is doing. Aaron On 8/03/2011, at 2:49 PM, mcasandra wrote: > > aaron morton wrote: >> >> 2) um, not sure. The nodetool output below looks like there are only 2 >> nodes in that cluster, i.e. there are no down nodes. >> > There are actually 3 nodes. Not sure why it's not showing the other node in > the output which is currently down. The error I am getting is from the the > 3rd node that is currently down. > > Here are the logs which shows it tried to talk to other 2 nodes: > > --- > INFO [HintedHandoff:1] 2011-03-07 17:02:36,463 HintedHandOffManager.java > (line 248) Finished hinted handoff of 0 rows to endpoint /181.116.206.179 > INFO [GossipStage:1] 2011-03-07 17:02:36,463 StorageService.java (line 606) > Node /181.116.208.68 state jump to normal > INFO [HintedHandoff:1] 2011-03-07 17:02:36,463 HintedHandOffManager.java > (line 192) Started hinted handoff for endpoint /181.116.208.68 > INFO [HintedHandoff:1] 2011-03-07 17:02:36,464 HintedHandOffManager.java > (line 248) Finished hinted handoff of 0 rows to endpoint /181.116.208.68 > INFO [main] 2011-03-07 17:04:06,424 StorageService.java (line 399) Joining: > getting bootstrap token > INFO [main] 2011-03-07 17:04:06,426 ColumnFamilyStore.java (line 648) > switching in a fresh Memtable for LocationInfo at > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1299546155643.log', > position=296) > INFO [main] 2011-03-07 17:04:06,426 ColumnFamilyStore.java (line 952) > Enqueuing flush of Memtable-LocationInfo@1367996500(36 bytes, 1 operations) > INFO [FlushWriter:1] 2011-03-07 17:04:06,427 Memtable.java (line 155) > Writing Memtable-LocationInfo@1367996500(36 bytes, 1 operations) > INFO [FlushWriter:1] 2011-03-07 17:04:06,659 Memtable.java (line 162) > Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-80-Data.db > (156 bytes) > INFO [CompactionExecutor:1] 2011-03-07 17:04:06,660 CompactionManager.java > (line 272) Compacting > [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-77-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-78-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-79-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-80-Data.db')] > INFO [main] 2011-03-07 17:04:06,660 StorageService.java (line 399) Joining: > sleeping 30000 ms for pending range setup > INFO [CompactionExecutor:1] 2011-03-07 17:04:06,849 CompactionManager.java > (line 354) Compacted to > /var/lib/cassandra/data/system/LocationInfo-tmp-e-81-Data.db. 1,293 to 832 > (~64% of original) bytes for 4 keys. Time: 185ms. > INFO [main] 2011-03-07 17:04:36,667 StorageService.java (line 399) > Bootstrapping > ERROR [main] 2011-03-07 17:04:36,677 AbstractCassandraDaemon.java (line 234) > Exception encountered during startup. > java.lang.IllegalStateException: replication factor (3) exceeds number of > endpoints (2) > ---- > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Exception-when-bringing-up-nodes-during-failure-testing-tp6085692p6099853.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.