ok, so we just lost the data on that node. are building the raid on it, but once it is up what is the best way to bring it back in the cluster
- just let it come up and run nodetool repair - copy data from another node and then run nodetool repair, - do I still need to run repair immeidately if I copy the data? Want to schedule repair for later during non peak hours? like I said have 500G and am on 0.7.4, 3 node cluster and RF=3 On Thu, Aug 18, 2011 at 9:42 PM, aaron morton <aa...@thelastpickle.com>wrote: > You should get on 0.7.4 while you are doing this, this is a pretty good > reason > https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58 > > Never done a read repair on this cluster before, is that a problem? > > Potentially. > Repair will ensure that your data is distributed, and that deletes done > mysteriously come back to life > http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds > > Personally I would get a repair to complete before I started this process. > > You may want to make sure everything is compacted as best it can be before > hand, see some of the other threads about repair using a lot of space. > > * use nodetool to change the compaction threshold down to 2 for the CF's > * trigger a minor compaction using nodetool flush > * wait and monitor using nodetool compactionstats > > The do a repair, reapir one CF at a time. Starting with the smallest CF. > Monitor disk space and > nodetool compactionstats > then > nodetool netstats > > > If you have the network space I would just move the files and then put them > backā¦. > > * drain > * copy the /var/lib/cassandra/data and saved_caches dirs > * copy the yaml > * blast away > * put things back in place > * start up and run repair > > I know you have RF 3 and 3 nodes. I'm been cautious. If you don't have > space the current approach is fine. > > You may want to disable Hinted Handoff while you are doing this as you are > going to run repair anyway when the node comes back. > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 19/08/2011, at 11:57 AM, Anand Somani wrote: > > Hi, > > version - 0.7.4 > cluster size = 3 > RF = 3. > data size on a node ~500G > > I want to do some disk maintenance on a cassandra node, so the process that > I came up with is > > - drain this node > - back up the system data space > - rebuild the disk partition > - copy data from another node > - copy data from the backed up system data > - restart node > - run nodetool repair > > Is this process sane. Never done a read repair on this cluster before, is > that a problem? Should I run it per CF? Would it help if I did this before > bringing the node down? > > Any pointers, things to worry about. > > Thanks > Anand > > >