IMHO creating a keyspace with RF higher than the number of nodes sounds like a bug. It puts the cluster into a bad place. It may even be a regression, will take a look at the code.
The assertion is interesting. Can you reproduce it with logging at debug and post the results? Could you try to reproduce it with a clean cluster? Thanks Aaron On 11/03/2011, at 10:24 PM, Patrik Modesto <patrik.mode...@gmail.com> wrote: > Unfortunately I can't provide the info, I deleted it. It was in wery > strange state. > > I started with new cluster today, 2 nodes, each with > auto_bootstrap:true. I can create a keyspace with RF=3, but I can't > insert any data in it. It didn't happen with the old cluster which > made me think. How could I insert data in the old cluster in keyspace > with RF=3 but with just 2 nodes? I found out that the cluster had 3 > nodes for short time in the past. We had to remove/return one node but > that was enough for the cluster to accept writes to keyspace with RF=3 > even with just 2 nodes. > > So I tried to recreate the cluster state: > > I have 4 clean server, cassndra 0.7.3, auto_bootstrap:true > > 1) setup & run node1 - success > > 2) create keyspace Context with rf=3" and create CF Url via > cassandra-cli - success > > 3) list Url - Internal error processing get_range_slicesl > node1: > ERROR 09:46:28,725 Internal error processing get_range_slices > java.lang.IllegalStateException: replication factor (3) exceeds number > of endpoints (1) > > 4) setup & run node2 - success > > 5) list Url on node1 - Internal error processing get_range_slicesl > node1: > ERROR 09:46:28,725 Internal error processing get_range_slices > java.lang.IllegalStateException: replication factor (3) exceeds number > of endpoints (1) > > 6) list Url on node2 - Internal error processing get_range_slicesl > node2: > ERROR 09:50:54,231 Internal error processing get_range_slices > java.lang.IllegalStateException: replication factor (3) exceeds number > of endpoints (2) > > 7) insert on node1 - Internal error processing insert > node1: > ERROR 09:53:11,669 Internal error processing insert > java.lang.IllegalStateException: replication factor (3) exceeds number > of endpoints (2) > > 8) insert on node2 - Internal error processing insert > node2: > ERROR 09:53:54,833 Internal error processing insert > java.lang.IllegalStateException: replication factor (3) exceeds number > of endpoints (2) > > 9) setup & run node3 - success > > 10) list Url on node1 - success > > 11) insert in Url on node1 - success > > 12) stop cassandra on node3 - success > > 13) list & insert on node1&2 - success > > 14) loadbalance on node1 - Exception in thread "main" > java.lang.IllegalStateException: replication factor (3) exceeds number > of endpoints (2) > > 15) setup & run node4 - success > > 16) list Url on node4 - success BUT > node4: > ERROR 10:05:38,452 Fatal exception in thread > Thread[RequestResponseStage:1,5,main] > java.lang.AssertionError > at > org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ERROR 10:05:38,462 Fatal exception in thread > Thread[RequestResponseStage:17,5,main] > java.lang.AssertionError > at > org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > 17) loadbalance on node1 - success > > 18) list Url on node4 - success BUT > node4: > ERROR 10:09:58,251 Fatal exception in thread > Thread[RequestResponseStage:18,5,main] > java.lang.AssertionError > at > org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ERROR 10:09:58,257 Fatal exception in thread > Thread[RequestResponseStage:5,5,main] > java.lang.AssertionError > at > org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > 19) repair on node4 - after long long wait I killed it, non of the > nodes report any error > > 20) list Url on node1 - success BUT > node1: > ERROR 10:18:53,715 Fatal exception in thread > Thread[RequestResponseStage:6,5,main] > java.lang.AssertionError > at > org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127) > at > org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > So, I can't get the cluster to the state it was before the reinstalled > it, where I couldn't bootstrap new node. I hope it was just > combination of cassandra upgrades and lots of scheme changes and that > it won't happen in production. OTOH there is the AssertionError which > doesn't look good but I can insert/retrieve the data. > > Regards, > Patrik