>The book is wrong, at least by current versions of Cassandra (I'm
>basing that on the quote you pasted, I don't know the context).

To be sure that I didn't misunderstand (English is not my mother tongue) here 
is what the entire "repair paragraph" says ...

Basic Maintenance
There are a few tasks that you’ll need to perform before or after more 
impactful tasks.
For example, it makes sense to take a snapshot only after you’ve performed a 
flush. So
in this section we look at some of these basic maintenance tasks: repair, 
snapshot, and
cleanup.

Repair
Running nodetool repair causes Cassandra to execute a major compaction. A 
Merkle
tree of the data on the target node is computed, and the Merkle tree is 
compared with
those of other replicas. This step makes sure that any data that might be out 
of sync
with other nodes isn’t forgotten.
During a major compaction (see “Compaction” in the Glossary), the server 
initiates a
TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring
nodes. The Merkle tree is a hash representing the data in that column family. 
If the
trees from the different nodes don’t match, they have to be reconciled (or 
“repaired”)
in order to determine the latest data values they should all be set to. This 
tree compar-
ison validation is the responsibility of the org.apache.cassandra.service.
AntiEntropy
Service class. AntiEntropyService implements the Singleton pattern and defines 
the
static Differencer class as well, which is used to compare two trees. If it 
finds any
differences, it launches a repair for the ranges that don’t agree.
So although Cassandra takes care of such matters automatically on occasion, 
you can
run it yourself as well.



>
>nodetool repair must be scheduled by the operator to run regularly.
>The name "repair" is a bit unfortunate; it is not meant to imply that
>it only needs to run when something is "wrong".
>
>-- 
>/ Peter Schuller
>


Reply via email to