Re: problem with bootstrap

Aaron Morton Fri, 11 Mar 2011 13:32:15 -0800

IMHO creating a keyspace with RF higher than the number of nodes sounds like a 
bug. It puts the cluster into a bad place. It may even be a regression, will 
take a look at the code.


The assertion is interesting. Can you reproduce it with logging at debug and 
post the results? Could you try to reproduce it with a clean cluster?

Thanks
Aaron


On 11/03/2011, at 10:24 PM, Patrik Modesto <patrik.mode...@gmail.com> wrote:

> Unfortunately I can't provide the info, I deleted it. It was in wery
> strange state.
> 
> I started with new cluster today, 2 nodes, each with
> auto_bootstrap:true. I can create a keyspace with RF=3, but I can't
> insert any data in it. It didn't happen with the old cluster which
> made me think. How could I insert data in the old cluster in keyspace
> with RF=3 but with just 2 nodes? I found out that the cluster had 3
> nodes for short time in the past. We had to remove/return one node but
> that was enough for the cluster to accept writes to keyspace with RF=3
> even with just 2 nodes.
> 
> So I tried to recreate the cluster state:
> 
> I have 4 clean server, cassndra 0.7.3, auto_bootstrap:true
> 
> 1) setup & run node1 - success
> 
> 2) create keyspace Context with rf=3" and create CF Url via
> cassandra-cli - success
> 
> 3) list Url - Internal error processing get_range_slicesl
> node1:
> ERROR 09:46:28,725 Internal error processing get_range_slices
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (1)
> 
> 4) setup & run node2 - success
> 
> 5) list Url on node1 - Internal error processing get_range_slicesl
> node1:
> ERROR 09:46:28,725 Internal error processing get_range_slices
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (1)
> 
> 6) list Url on node2 - Internal error processing get_range_slicesl
> node2:
> ERROR 09:50:54,231 Internal error processing get_range_slices
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
> 
> 7) insert on node1 - Internal error processing insert
> node1:
> ERROR 09:53:11,669 Internal error processing insert
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
> 
> 8) insert on node2 - Internal error processing insert
> node2:
> ERROR 09:53:54,833 Internal error processing insert
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
> 
> 9) setup & run node3 - success
> 
> 10) list Url on node1 - success
> 
> 11) insert in Url on node1 - success
> 
> 12) stop cassandra on node3 - success
> 
> 13) list & insert on node1&2 - success
> 
> 14) loadbalance on node1 - Exception in thread "main"
> java.lang.IllegalStateException: replication factor (3) exceeds number
> of endpoints (2)
> 
> 15) setup & run node4 - success
> 
> 16) list Url on node4 - success BUT
> node4:
> ERROR 10:05:38,452 Fatal exception in thread
> Thread[RequestResponseStage:1,5,main]
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
>        at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR 10:05:38,462 Fatal exception in thread
> Thread[RequestResponseStage:17,5,main]
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
>        at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> 17) loadbalance on node1 - success
> 
> 18) list Url on node4 - success BUT
> node4:
> ERROR 10:09:58,251 Fatal exception in thread
> Thread[RequestResponseStage:18,5,main]
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
>        at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR 10:09:58,257 Fatal exception in thread
> Thread[RequestResponseStage:5,5,main]
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
>        at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> 19) repair on node4 - after long long wait I killed it, non of the
> nodes report any error
> 
> 20) list Url on node1 - success BUT
> node1:
> ERROR 10:18:53,715 Fatal exception in thread
> Thread[RequestResponseStage:6,5,main]
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
>        at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> So, I can't get the cluster to the state it was before the reinstalled
> it, where I couldn't bootstrap new node. I hope it was just
> combination of cassandra upgrades and lots of scheme changes and that
> it won't happen in production. OTOH there is the AssertionError which
> doesn't look good but I can insert/retrieve the data.
> 
> Regards,
> Patrik

Re: problem with bootstrap

Reply via email to