Re: Data Loss/Missing With Cassandra

Nimi Wariboko Jr Sun, 09 Jun 2013 13:20:07 -0700

If I had to do a repair after upping the RF, than that is probably what caused 
the data loss. Wish I had been more careful.


I'm guessing the data is irrevocably lost, I didn't make any any snapshots.

Would it be possible to figure out if only a certain part of the ring was 
effected? That would be helpful in figuring out what data was lost.

I've done a full repair now, so I'm also guessing that inconsistent data is now 
completely gone as well, right? 

On Sunday, June 9, 2013 at 10:37 AM, Edward Capriolo wrote:

> Sounds like your cluster got shufflef*cked.
> You said : "After we had gotten all the data moved over we decided to add 2 
> more nodes, as well as up the RF to 2."
> 
> After your raise replication you have to run repair on all nodes. If you did 
> not, and then you proceeded to shuffle you will likely have a data loss.
> 
> If you did repair all nodes before the shuffle, I do not know then the 
> shuffle must have went wrong. If your reading at CL.ALL and still seeing 
> inconsistencies that is bad. Possible try raising the read repair chance to 
> 100% and continue reading and see if the data becomes consistent (though I do 
> not know why repair would not do it).
> 
> 
> 
> 
> 
> On Sat, Jun 8, 2013 at 8:56 PM, Nimi Wariboko Jr <nimiwaribo...@gmail.com 
> (mailto:nimiwaribo...@gmail.com)> wrote:
> > Hi, 
> > 
> > We are seeing an issue where data that was written to the cluster is no 
> > longer accessible after trying to expand the size of the cluster. I will 
> > try and provide as much information as possible, I am just starting at with 
> > Cassandra and I'm not entirely sure what data is relevant. 
> > 
> > All Cassandra nodes are 1.2.5, and each node has the same config. 
> > 
> > We started out moving our entire data set to a single cassandra node. This 
> > node was initially set up with Initial Token : 0, as well as other default 
> > settings. After we had gotten all the data moved over we decided to add 2 
> > more nodes, as well as up the RF to 2. We also decided to start using 
> > vnodes which meant setting num_tokens to 256 and removing the initial token 
> > param. We then decided to run cassandra-shuffle as well. 
> > 
> > During cassandra-shuffle we started to notice some rows were disappearing 
> > then reappearing, and other rows haven't come back at all. I decided to 
> > stop the shuffle and repair each node then restart the cluster, however all 
> > the data hasn't come back. Note that this is CONSISTENCY ALL 
> > 
> > Here is my `nodetool status` What is weird here is the token distribution 
> > 260-239-1. I'm not an expert but I believe it should be 256-256-256, or at 
> > least add up to 768.
> > 
> > Datacenter: 129
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address       Load       Tokens  Owns   Host ID                         
> >       Rack
> > UN  10.129.196.4  371.56 GB  260     38.1%  
> > cde6c3be-a066-47f2-abc2-b1d78bee0d7c  196
> > UN  10.129.196.5  212.64 GB  239     61.5%  
> > 2cb24510-2f89-46b2-96b9-873f8e8e50da  196
> > UN  10.129.196.6  256.05 GB  1       0.4%   
> > ce8d4ea9-8106-44b3-a2dd-c0230eb53c94  196
> > 
> > 
> > (http://pastebin.com/37SwNaGq)
> > 
> > And here is the opscenter ring view (http://imgur.com/VssmFlw) 
> > 
> > What also weird is the token count from nodetool -h [host] info differs 
> > from status. 
> > 
> > Example:
> > root@cass1:~# nodetool -h cass1 info | grep Token
> > Token            : (invoke with -T/--tokens to see all 239 tokens)
> > root@cass1:~# nodetool -h cass2 info | grep Token
> > Token            : (invoke with -T/--tokens to see all 269 tokens)
> > root@cass1:~# nodetool -h cass3 info | grep Token
> > Token            : (invoke with -T/--tokens to see all 260 tokens)
> > 
> > 
> > (Full output: http://pastebin.com/2hxpArt0)
> > 
> > I believe it has something to do with the cluster not "seeing" all the 
> > tokens, but I am not sure where to continue from here. I don't believe any 
> > data was lost there was no power outage, and all the data should have been 
> > committed to disk before we added the two other nodes. 
> > 
> > Thanks,
> > Nimi
> > nimiwaribo...@gmail.com (mailto:nimiwaribo...@gmail.com)
> > 
> 
> 
>

Re: Data Loss/Missing With Cassandra

Reply via email to