I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each node. I was bulk loading data over the weekend. But we forgot to turn off the weekly nodetool repair job. As a result, repair was interfering when we were bulk loading data. I canceled repair by restarting the nodes. But unfortunately after the restart it looks like I dont have any data on those nodes when I use list on cassandra-cli. I ran repair on one of the effected nodes, but repair seems to be taking forever. Disk space has almost tripled. I stopped the repair again in fear of running out of disk space. After restart, the disk space is at 50% where as the good nodes are at 25%. How should I proceed from here. When I run list on cassandra-cli I do see data on the effected node. But how can I be sure I have all the data. Should I run repair again. Should I cleanup the disk by clearing snapshots. Or should I just drop column families and bulk load the data again?
Thanks -Raj