If I had to do a repair after upping the RF, than that is probably what caused the data loss. Wish I had been more careful.
I'm guessing the data is irrevocably lost, I didn't make any any snapshots. Would it be possible to figure out if only a certain part of the ring was effected? That would be helpful in figuring out what data was lost. I've done a full repair now, so I'm also guessing that inconsistent data is now completely gone as well, right? On Sunday, June 9, 2013 at 10:37 AM, Edward Capriolo wrote: > Sounds like your cluster got shufflef*cked. > You said : "After we had gotten all the data moved over we decided to add 2 > more nodes, as well as up the RF to 2." > > After your raise replication you have to run repair on all nodes. If you did > not, and then you proceeded to shuffle you will likely have a data loss. > > If you did repair all nodes before the shuffle, I do not know then the > shuffle must have went wrong. If your reading at CL.ALL and still seeing > inconsistencies that is bad. Possible try raising the read repair chance to > 100% and continue reading and see if the data becomes consistent (though I do > not know why repair would not do it). > > > > > > On Sat, Jun 8, 2013 at 8:56 PM, Nimi Wariboko Jr <nimiwaribo...@gmail.com > (mailto:nimiwaribo...@gmail.com)> wrote: > > Hi, > > > > We are seeing an issue where data that was written to the cluster is no > > longer accessible after trying to expand the size of the cluster. I will > > try and provide as much information as possible, I am just starting at with > > Cassandra and I'm not entirely sure what data is relevant. > > > > All Cassandra nodes are 1.2.5, and each node has the same config. > > > > We started out moving our entire data set to a single cassandra node. This > > node was initially set up with Initial Token : 0, as well as other default > > settings. After we had gotten all the data moved over we decided to add 2 > > more nodes, as well as up the RF to 2. We also decided to start using > > vnodes which meant setting num_tokens to 256 and removing the initial token > > param. We then decided to run cassandra-shuffle as well. > > > > During cassandra-shuffle we started to notice some rows were disappearing > > then reappearing, and other rows haven't come back at all. I decided to > > stop the shuffle and repair each node then restart the cluster, however all > > the data hasn't come back. Note that this is CONSISTENCY ALL > > > > Here is my `nodetool status` What is weird here is the token distribution > > 260-239-1. I'm not an expert but I believe it should be 256-256-256, or at > > least add up to 768. > > > > Datacenter: 129 > > =============== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens Owns Host ID > > Rack > > UN 10.129.196.4 371.56 GB 260 38.1% > > cde6c3be-a066-47f2-abc2-b1d78bee0d7c 196 > > UN 10.129.196.5 212.64 GB 239 61.5% > > 2cb24510-2f89-46b2-96b9-873f8e8e50da 196 > > UN 10.129.196.6 256.05 GB 1 0.4% > > ce8d4ea9-8106-44b3-a2dd-c0230eb53c94 196 > > > > > > (http://pastebin.com/37SwNaGq) > > > > And here is the opscenter ring view (http://imgur.com/VssmFlw) > > > > What also weird is the token count from nodetool -h [host] info differs > > from status. > > > > Example: > > root@cass1:~# nodetool -h cass1 info | grep Token > > Token : (invoke with -T/--tokens to see all 239 tokens) > > root@cass1:~# nodetool -h cass2 info | grep Token > > Token : (invoke with -T/--tokens to see all 269 tokens) > > root@cass1:~# nodetool -h cass3 info | grep Token > > Token : (invoke with -T/--tokens to see all 260 tokens) > > > > > > (Full output: http://pastebin.com/2hxpArt0) > > > > I believe it has something to do with the cluster not "seeing" all the > > tokens, but I am not sure where to continue from here. I don't believe any > > data was lost there was no power outage, and all the data should have been > > committed to disk before we added the two other nodes. > > > > Thanks, > > Nimi > > nimiwaribo...@gmail.com (mailto:nimiwaribo...@gmail.com) > > > > >