OK, is seems a "phantom" node (one that was removed from the cluster)
kept being passed around in gossip as a down endpoint and was messing
up the gossip algorithm.  I had the luxury of being able to stop the
entire cluster and bring the nodes up one by one.  That purged the bad
node from gossip.  Not sure if there was a more elegant way to do
that.

On Fri, May 27, 2011 at 9:28 AM,  <jonathan.co...@gmail.com> wrote:
> Anyone have any idea what this could mean?
> This is a cluster of 7 nodes, I'm trying to add the 8th node.
>
> INFO [FlushWriter:1] 2011-05-27 09:22:40,495 Memtable.java (line 164)
> Completed flushing /var/lib/cassandra/data/system/Migrations-f-1-Data.db
> (6358 bytes)
> INFO [FlushWriter:1] 2011-05-27 09:22:40,496 Memtable.java (line 157)
> Writing Memtable-Schema@60230368(2363 bytes, 3 operations)
> INFO [FlushWriter:1] 2011-05-27 09:22:40,562 Memtable.java (line 164)
> Completed flushing /var/lib/cassandra/data/system/Schema-f-1-Data.db (2513
> bytes)
> INFO [GossipStage:1] 2011-05-27 09:22:40,829 Gossiper.java (line 610) Node
> /10.46.108.104 is now part of the cluster
> ERROR [GossipStage:1] 2011-05-27 09:22:40,845
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.IllegalStateException: replication factor (3) exceeds number of
> endpoints (1)
> at
> org.apache.cassandra.locator.OldNetworkTopologyStrategy.calculateNaturalEndpoints(OldNetworkTopologyStrategy.java:100)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getAddressRanges(AbstractReplicationStrategy.java:196)
> at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:945)
> at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
> at
> org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:707)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:648)
> at
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1124)
> at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:643)
> at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:611)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:690)
> at
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ERROR [GossipStage:1] 2011-05-27 09:22:40,847 AbstractCassandraDaemon.java
> (line 112) Fatal exception in thread Thread[GossipStage:1,5,main]
> java.lang.IllegalStateException: replication factor (3) exceeds number of
> endpoints (1)
> at
> org.apache.cassandra.locator.OldNetworkTopologyStrategy.calculateNaturalEndpoints(OldNetworkTopologyStrategy.java:100)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getAddressRanges(AbstractReplicationStrategy.java:196)
> at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:945)
> at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
> at
> org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:707)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:648)
> at
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1124)
> at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:643)
> at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:611)
> at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:690)
> at
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:60)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)

Reply via email to