The DataStax doc should be current best practices:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
If you or anybody else finds it inadequate, speak up.
-- Jack Krupansky
-----Original Message-----
From: Paolo Crosato
Sent: Thursday, June 19, 2014 10:13 AM
To: user@cassandra.apache.org
Subject: Best practices for repair
Hi eveybody,
we have some problems running repairs on a timely schedule. We have a
three node deployment, and we start repair on one node every week,
repairing one columnfamily by one.
However, when we run into the big column families, usually repair
sessions hangs undefinitely, and we have to restart them manually.
The script runs commands like:
nodetool repair keyspace columnfamily
one by one.
This has not been a major issue for some time, since we never delete
data, however we would like to sort the issue once and for all.
Reading resources on the net, I came to the conclusion that we could:
1) either run a repair sessione like the one above, but with the -pr
switch, and run it on every node, not just on one
2) or run sub range repair as described here
http://www.datastax.com/dev/blog/advanced-repair-techniques , which
would be the best option.
However the latter procedure would require us to write some java program
that calls describe_splits to get the tokens to feed nodetool repair with.
The second procedure is available out of the box only in the commercial
version of the opscenter, is this true?
I would like to know if these are the current best practices for repairs
or if there is some other option that makes repair easier to perform,
and more
reliable that it is now.
Regards,
Paolo Crosato
--
Paolo Crosato
Software engineer/Custom Solutions
e-mail: paolo.cros...@targaubiest.com