> As a side effect of the failed repair (so it seems) the disk usage on the > affected node prevents compaction from working. It still works on > the remaining nodes (we have 3 total). > Is there a way to scrub the extraneous data?
This is one of the reasons why killing an in-process repair is a bad thing :( If you do not have enough disk space for any kind of compaction to work, then no, unfortunately there is no easy way to get rid of the data. You can go to extra trouble such as moving the entire node to some other machine (e.g. firewalled from the cluster) with more disk and run compaction there and then "move it back", but that is kind of painful to do. Another option is to decommission the node and replace it. However, be aware that (1) that leaves the ring with less capacity for a while, and (2) when you decommission, the data you stream from that node to others would be artificially inflated due to the repair so there is some risk of "infecting" the other nodes with a large data set. I should mention that if you have no traffic running against the cluster, one way is to just remove all the data and then run repair afterwards. But that implies that you're trusting that (1) no reads are going to the cluster (else you might serve reads based on missing data) and (2) that you are comfortable with loss of the data on the node. (2) might be okay if you're e.g. writing at QUORUM at all times and have RF >= 3 (basically, this is as if the node would have been lost due to hardware breakage). A faster way to reconstruct the node would be to delete the data from your keyspaces (except the system keyspace), start the node (now missing data), and run 'nodetool rebuild' after https://issues.apache.org/jira/browse/CASSANDRA-3483 is done. The patch attached to that ticket should work for 0.8.6 I suspect (but no guarantees). This also assumes you have no reads running against the cluster. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)