You should run repair. If the disk space is the problem, try to cleanup and major compact before repair. You can limit the streaming data by running repair for each column family separately.
maki On 2012/04/28, at 23:47, Raj N <raj.cassan...@gmail.com> wrote: > I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each node. > I was bulk loading data over the weekend. But we forgot to turn off the > weekly nodetool repair job. As a result, repair was interfering when we were > bulk loading data. I canceled repair by restarting the nodes. But > unfortunately after the restart it looks like I dont have any data on those > nodes when I use list on cassandra-cli. I ran repair on one of the effected > nodes, but repair seems to be taking forever. Disk space has almost tripled. > I stopped the repair again in fear of running out of disk space. After > restart, the disk space is at 50% where as the good nodes are at 25%. How > should I proceed from here. When I run list on cassandra-cli I do see data > on the effected node. But how can I be sure I have all the data. Should I run > repair again. Should I cleanup the disk by clearing snapshots. Or should I > just drop column families and bulk load the data again? > > Thanks > -Raj