Problems with adding datacenter and schema version disagreement

olek.stas...@gmail.com Tue, 11 Mar 2014 05:32:41 -0700

Hi All,
I've faced an issue with cassandra 2.0.5.
I've 6 node cluster with random partitioner, still using tokens
instead of vnodes.
Cause we're changing hardware we decide to migrate cluster to 6 new
machines and change partitioning options to vnode rather then
token-based.
I've followed instruction on site:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
and started cassandra on 6 new nodes in new DC. Everything seems to
work correctly, nodes were seen from all others as up and normal.
Then i performed nodetool repair -pr on the first of new nodes.
But process falls into infinite loop, sending/receiving merkle trees
over and over. It hangs on one very small KS it there were no hope it
will stop sometime (process was running whole night).
So I decided to stop the repair and restart cass on this particular
new node. after restart 'Ive tried repair one more time with another
small KS, but it also falls into infinite loop.
So i decided to break the procedure of adding datacenter, remove nodes
from new DC and start all from scratch.
After running removenode on all new nodes I've wiped data dir and
start cassandra on new node once again. During the start messages
"org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=98bb99a2-42f2-3fcd-af67-208a4faae5fa"
appears in logs. Google said, that they may mean problems with schema
versions consistency, so I performed describe cluster in cassandra-cli
and i get:
Cluster Information:
Name: Metadata Cluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
76198f8b-663f-3434-8860-251ebc6f50c4: [150.254.164.4]


f48d3512-e299-3508-a29d-0844a0293f3a: [150.254.164.3]

16ad2e35-1eef-32f0-995c-e2cbd4c18abf: [150.254.164.6]

72352017-9b0d-3b29-8c55-ed86f30363c5: [150.254.164.1]

7f1faa84-0821-3311-9232-9407500591cc: [150.254.164.5]

85cd0ebc-5d33-3bec-a682-8c5880ee2fa1: [150.254.164.2]

So now I have 6 diff schema version for cluster. But how it can
happened? How can I take my cluster to consistent state?
What did I wrong during extending cluster, so nodetool falls into infinite loop?
At the first sight data looks ok, I can read from cluster and I'm
getting expected output.
best regards
Aleksander

Problems with adding datacenter and schema version disagreement

Reply via email to