Re: Data loss when swapping out cluster

Jeremiah D Jordan Tue, 26 Nov 2013 08:47:10 -0800

TL;DR you need to run repair in between doing those two things.

Full explanation:
https://issues.apache.org/jira/browse/CASSANDRA-2434
https://issues.apache.org/jira/browse/CASSANDRA-5901


Thanks,
-Jeremiah Jordan

On Nov 25, 2013, at 11:00 AM, Christopher J. Bottaro 
<cjbott...@academicworks.com> wrote:

> Hello,
> 
> We recently experienced (pretty severe) data loss after moving our 4 node 
> Cassandra cluster from one EC2 availability zone to another.  Our strategy 
> for doing so was as follows:
> One at a time, bring up new nodes in the new availability zone and have them 
> join the cluster.
> One at a time, decommission the old nodes in the old availability zone and 
> turn them off (stop the Cassandra process).
> Everything seemed to work as expected.  As we decommissioned each node, we 
> checked the logs for messages indicating "yes, this node is done 
> decommissioning" before turning the node off.
> 
> Pretty quickly after the old nodes left the cluster, we started getting 
> client calls about data missing.
> 
> We immediately turned the old nodes back on and when they rejoined the 
> cluster *most* of the reported missing data returned.  For the rest of the 
> missing data, we had to spin up a new cluster from EBS snapshots and copy it 
> over.
> 
> What did we do wrong?
> 
> In hindsight, we noticed a few things which may be clues...
> The new nodes had much lower load after joining the cluster than the old ones 
> (3-4 gb as opposed to 10 gb).
> We have EC2Snitch turned on, although we're using SimpleStrategy for 
> replication.
> The new nodes showed even ownership (via nodetool status) after joining the 
> cluster.
> Here's more info about our cluster...
> Cassandra 1.2.10
> Replication factor of 3
> Vnodes with 256 tokens
> All tables made via CQL
> Data dirs on EBS (yes, we are aware of the performance implications)
> 
> Thanks for the help.

Re: Data loss when swapping out cluster

Reply via email to