Simulated Disaster Recovery Steps

John Pyeatt Thu, 13 Mar 2014 09:47:46 -0700

I'm trying to simulate a DR scenario. I have successfully simulated the
'D', but my 'R' isn't quite right.


Quick background:
1.1) Cassandra. 1.2.13. static tokens (no vnodes). Ec2Snitch.
Murmur3Partitioner
1.2 For this simulation, 2-node cluster with RF of 1 on AWS EC2 instances
1.3) Simulated data is just cassandra-stress output, 1000000 rows to
Keyspace1.Standard1
1.4) After cassandra-stress I run nodetool cfstats Keyspace1.Standard1 on
the two nodes and the Number of Keys (estimate) adds to 1000000.


My steps to archive and recover.
2.1) Snapshot each node's data and store the AWS S3. Include a capture of
'show schema' in the data uploaded to S3.
2.2) Blow away both cassandra nodes
2.3) Start up two new nodes with the same configuration as initially
described.

After the two nodes are up and running I do the following.
3.1) recreate the Keyspace1 schema in the new cluster using the 'show
schema' data generated in step 2.1
3.2) stop the cassandra service on each running node.
3.3) delete the files in the commitlog directory on each node
3.4) copy the snapshot data from S3 to the cassandra SSTable directory on
each node being careful to put the snapshot data from one node on the first
new node and the snapshot data from the other old node on the second new
node.
3.5) run nodetool repair;nodetool cleanup on each new node.

Now everything should be good. However if I run a nodetool cfstats
Keyspace1.Standard1 on each box my Number of rows (estimated) looks right
on one box but the other one is several 100 thousand short.

Does anyone see anything wrong with my recovery steps?

-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
------------------
608.661.1184
john.pye...@singlewire.com

Simulated Disaster Recovery Steps

Reply via email to