Sounds like your cluster got shufflef*cked. You said : "After we had gotten all the data moved over we decided to add 2 more nodes, as well as up the RF to 2."
After your raise replication you have to run repair on all nodes. If you did not, and then you proceeded to shuffle you will likely have a data loss. If you did repair all nodes before the shuffle, I do not know then the shuffle must have went wrong. If your reading at CL.ALL and still seeing inconsistencies that is bad. Possible try raising the read repair chance to 100% and continue reading and see if the data becomes consistent (though I do not know why repair would not do it). On Sat, Jun 8, 2013 at 8:56 PM, Nimi Wariboko Jr <nimiwaribo...@gmail.com>wrote: > Hi, > > We are seeing an issue where data that was written to the cluster is no > longer accessible after trying to expand the size of the cluster. I will > try and provide as much information as possible, I am just starting at with > Cassandra and I'm not entirely sure what data is relevant. > > All Cassandra nodes are 1.2.5, and each node has the same config. > > We started out moving our entire data set to a single cassandra node. This > node was initially set up with Initial Token : 0, as well as other default > settings. After we had gotten all the data moved over we decided to add 2 > more nodes, as well as up the RF to 2. We also decided to start using > vnodes which meant setting num_tokens to 256 and removing the initial token > param. We then decided to run cassandra-shuffle as well. > > During cassandra-shuffle we started to notice some rows were disappearing > then reappearing, and other rows haven't come back at all. I decided to > stop the shuffle and repair each node then restart the cluster, however all > the data hasn't come back. Note that this is CONSISTENCY ALL > > Here is my `nodetool status` What is weird here is the token distribution > 260-239-1. I'm not an expert but I believe it should be 256-256-256, or at > least add up to 768. > > Datacenter: 129 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN 10.129.196.4 371.56 GB 260 38.1% > cde6c3be-a066-47f2-abc2-b1d78bee0d7c 196 > UN 10.129.196.5 212.64 GB 239 61.5% > 2cb24510-2f89-46b2-96b9-873f8e8e50da 196 > UN 10.129.196.6 256.05 GB 1 0.4% > ce8d4ea9-8106-44b3-a2dd-c0230eb53c94 196 > > (http://pastebin.com/37SwNaGq) > > And here is the opscenter ring view (http://imgur.com/VssmFlw) > > What also weird is the token count from nodetool -h [host] info differs > from status. > > Example: > root@cass1:~# nodetool -h cass1 info | grep Token > Token : (invoke with -T/--tokens to see all 239 tokens) > root@cass1:~# nodetool -h cass2 info | grep Token > Token : (invoke with -T/--tokens to see all 269 tokens) > root@cass1:~# nodetool -h cass3 info | grep Token > Token : (invoke with -T/--tokens to see all 260 tokens) > > (Full output: http://pastebin.com/2hxpArt0) > > I believe it has something to do with the cluster not "seeing" all the > tokens, but I am not sure where to continue from here. I don't believe any > data was lost there was no power outage, and all the data should have been > committed to disk before we added the two other nodes. > > Thanks, > Nimi > nimiwaribo...@gmail.com >