Re: problem with bootstrap

Patrik Modesto Fri, 11 Mar 2011 01:25:43 -0800

Unfortunately I can't provide the info, I deleted it. It was in wery
strange state.


I started with new cluster today, 2 nodes, each with
auto_bootstrap:true. I can create a keyspace with RF=3, but I can't
insert any data in it. It didn't happen with the old cluster which
made me think. How could I insert data in the old cluster in keyspace
with RF=3 but with just 2 nodes? I found out that the cluster had 3
nodes for short time in the past. We had to remove/return one node but
that was enough for the cluster to accept writes to keyspace with RF=3
even with just 2 nodes.

So I tried to recreate the cluster state:

I have 4 clean server, cassndra 0.7.3, auto_bootstrap:true

1) setup & run node1 - success

2) create keyspace Context with rf=3" and create CF Url via
cassandra-cli - success

3) list Url - Internal error processing get_range_slicesl
node1:
ERROR 09:46:28,725 Internal error processing get_range_slices
java.lang.IllegalStateException: replication factor (3) exceeds number
of endpoints (1)

4) setup & run node2 - success

5) list Url on node1 - Internal error processing get_range_slicesl
node1:
ERROR 09:46:28,725 Internal error processing get_range_slices
java.lang.IllegalStateException: replication factor (3) exceeds number
of endpoints (1)

6) list Url on node2 - Internal error processing get_range_slicesl
node2:
ERROR 09:50:54,231 Internal error processing get_range_slices
java.lang.IllegalStateException: replication factor (3) exceeds number
of endpoints (2)

7) insert on node1 - Internal error processing insert
node1:
ERROR 09:53:11,669 Internal error processing insert
java.lang.IllegalStateException: replication factor (3) exceeds number
of endpoints (2)

8) insert on node2 - Internal error processing insert
node2:
ERROR 09:53:54,833 Internal error processing insert
java.lang.IllegalStateException: replication factor (3) exceeds number
of endpoints (2)

9) setup & run node3 - success

10) list Url on node1 - success

11) insert in Url on node1 - success

12) stop cassandra on node3 - success

13) list & insert on node1&2 - success

14) loadbalance on node1 - Exception in thread "main"
java.lang.IllegalStateException: replication factor (3) exceeds number
of endpoints (2)

15) setup & run node4 - success

16) list Url on node4 - success BUT
node4:
ERROR 10:05:38,452 Fatal exception in thread
Thread[RequestResponseStage:1,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR 10:05:38,462 Fatal exception in thread
Thread[RequestResponseStage:17,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

17) loadbalance on node1 - success

18) list Url on node4 - success BUT
node4:
ERROR 10:09:58,251 Fatal exception in thread
Thread[RequestResponseStage:18,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR 10:09:58,257 Fatal exception in thread
Thread[RequestResponseStage:5,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

19) repair on node4 - after long long wait I killed it, non of the
nodes report any error

20) list Url on node1 - success BUT
node1:
ERROR 10:18:53,715 Fatal exception in thread
Thread[RequestResponseStage:6,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

So, I can't get the cluster to the state it was before the reinstalled
it, where I couldn't bootstrap new node. I hope it was just
combination of cassandra upgrades and lots of scheme changes and that
it won't happen in production. OTOH there is the AssertionError which
doesn't look good but I can insert/retrieve the data.

Regards,
Patrik

Re: problem with bootstrap

Reply via email to