SchemaDisagreementError when launching a new Cassandra (1.2.2) cluster ?

Alex Heneveld Mon, 09 Sep 2013 08:08:56 -0700


Hi folks,

I'm occasionally seeing SchemaDisagreementError on the boot of a *new*cluster. I'm hoping someone can explain what I'm doing wrong, or helpme track down the bug if it is one.

The problem occurs in about 1 in 4 launches when I start a 2-nodecluster, where the two machines are configured identically with bothnodes as the seeds (apart from the listen_address being different). Onthe problematic launches, describing schema versions immediately afterstart shows that the two nodes have different schemas (reported at bothnodes) and any attempt to work with the nodes returns the SDE. This isbefore I attempt to do anything to the cluster. After ~60s the nodesreconcile their differences, report a single schema used at both nodes,and I can use the cluster without problems.


Key points:

* The problem usually fixes itself 60s after startup (almost exactly, Ipoll every second)

* The problem is intermittent occurring on between 10% and 50% oflaunches (failure rates seem higher at peak cloud times -- so possiblylinked to background CPU/network/storage contention)

* For the problem period (the initial 60s), peer size is reported as 2,and both nodes report the same schema versions map containing twoschemas each with one of the nodes against them (after 60s the mapcontains one schema with both nodes)

* In some of the problematic launches, it takes ~120s to reconcile,where for the first 60s the nodes do not seem to see each other at all(each reports peer size 1, and a a single schema used by only one node(itself)), then for the next 60s the problem is as described above(disagreeing schemas); again the 60s/120s seems meaningfully precise

* The problem occurs whether the two nodes are launched simultaneouslyor are launched with a delay between the two

I have a workaround, which is to use just one node to seed this initialset. When the set of seeds is cardinality 1, the problem does notoccur. However the advice is to use 2 seeds and have them be the sameacross the cluster -- so I'd like to get to the bottom of this!

I'd also like to be sure that any subsequent nodes added to the clusteraren't going to cause the same problem when we start using it!

I am running Cassandra 1.2.2 running in Amazon, using Brooklyn(brooklyn.io) to start and manage it. I can share test cases,cassandra.yaml, logs, etc -- but am starting with the above summary incase anyone can point me in the right direction from that.


Thanks,
Alex

SchemaDisagreementError when launching a new Cassandra (1.2.2) cluster ?

Reply via email to