I've seen this around a couple of times now. 

On reason to fail if there is not enough nodes to meet the replication factor 
is that CL.ALL requests cannot be processed. You could make the argument that 
we can get into that state at any time is a node is down. But this error is 
their never been enough nodes in the ring regardless of their up/down state. So 
cassandra will never be able to meet the replication guarantees for the 
keyspace. E.g. if you kicked off a repair it would not leave the cluster in the 
expected state. 

Not sure if this is the official reason, just my thinking. And their may be 
other reasons. 

Sounds like you've made progress though. 

Cheers
Aaron

On 9/03/2011, at 4:23 AM, Patrik Modesto wrote:

> Hi,
> 
> I've small test cluster, 2 servers, both running successfully
> cassandra 0.7.3. I've three keyspaces, two with RF1, one with RF3. Now
> when I try to bootstrap 3rd server (empty initial_token,
> auto_bootstrap: true), I get this exception on the new server.
> 
> INFO 23:13:43,229 Joining: getting bootstrap token
> INFO 23:13:43,258 New token will be
> 127097301048222781806986236020167142093 to assume load from
> /10.0.18.99
> INFO 23:13:43,259 switching in a fresh Memtable for LocationInfo at
> CommitLogContext(file='/mnt/disk8/cassandra/data/CommitLog-1299622332896.log',
> position=1578072)
> INFO 23:13:43,259 Enqueuing flush of
> Memtable-LocationInfo@1526249359(106 bytes, 3 operations)
> INFO 23:13:43,259 Writing Memtable-LocationInfo@1526249359(106 bytes,
> 3 operations)
> INFO 23:13:43,276 Completed flushing
> /mnt/disk3/cassandra/data/system/LocationInfo-f-2-Data.db (211 bytes)
> INFO 23:13:43,277 Joining: sleeping 30000 ms for pending range setup
> INFO 23:14:13,277 Bootstrapping
> java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
> Caused by: java.lang.IllegalStateException: replication factor (3)
> exceeds number of endpoints (2)
>        at 
> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:212)
>        at 
> org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:198)
>        at 
> org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)
>        at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:525)
>        at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:453)
>        at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:403)
>        at 
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:194)
>        at 
> org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217)
>        ... 5 more
> Cannot load daemon
> Service exit with a return value of 3
> 
> On the other servers I get:
> 
> ERROR 15:54:24,670 Error in ThreadPoolExecutor
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
>        at 
> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at 
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929)
>        at 
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895)
>        at 
> org.apache.cassandra.service.StorageService.handleStateLeaving(StorageService.java:797)
>        at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:651)
>        at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:763)
>        at 
> org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:753)
>        at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:670)
>        at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR 15:54:24,672 Fatal exception in thread Thread[GossipStage:1,5,main]
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
>        at 
> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>        at 
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:929)
>        at 
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:895)
>        at 
> org.apache.cassandra.service.StorageService.handleStateLeaving(StorageService.java:797)
>        at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:651)
>        at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:763)
>        at 
> org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:753)
>        at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:670)
>        at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> 
> Removing the keyspace with RF3 fixed the problem and boostrap went
> well, but why is there a problem with less nodes than servers? I can
> imagine a situation when I would need to remove nodes from cluster and
> get to the point of having less servers than is the maximum RF used.
> I'd then be unable to bootstrap the new servers to the cluster.
> Removing the keyspace is not an option in production environment.
> 
> Thanks,
> Patrik

Reply via email to