Hi Vish, 1. This tool repairs inconsistencies across replicas of the row. Since > latest update always wins, I dont see inconsistencies other than ones > resulting from the combination of deletes, tombstones, and crashed nodes. > Technically, if data is never deleted from cassandra, then nodetool repair > does not need to be run at all. Is this understanding correct? If wrong, > can anyone provide other ways inconsistencies could occur? >
Even if you never delete data you should run repairs occasionally to ensure overall consistency. While hinted handoffs and read repairs do lead to better consistency, they are only helpers/optimization and are not regarded as operations that ensure consistency. 2. Want to understand the performance of 'nodetool repair' in a Cassandra > multi data center setup. As we add nodes to the cluster in various data > centers, does the performance of nodetool repair on each node increase > linearly, or is it quadratic ? > Its difficult to calculate the performance of a repair, I've seen the time to completion fluctuate between 4hrs to 10hrs+ on the same node. However in theory adding more nodes would spread the data and free up machine resources, thus resulting in more performant repairs. The essence of this question is: If I have a keyspace with x number of > replicas in each data center, do I have to deal with an upper limit on the > number of data centers/nodes? Could you expand on why you believe there would be an upper limit of dc/nodes due to running repairs? Mark On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran < vish.ramachand...@gmail.com> wrote: > Some questions on nodetool repair. > > 1. This tool repairs inconsistencies across replicas of the row. Since > latest update always wins, I dont see inconsistencies other than ones > resulting from the combination of deletes, tombstones, and crashed nodes. > Technically, if data is never deleted from cassandra, then nodetool repair > does not need to be run at all. Is this understanding correct? If wrong, > can anyone provide other ways inconsistencies could occur? > > 2. Want to understand the performance of 'nodetool repair' in a Cassandra > multi data center setup. As we add nodes to the cluster in various data > centers, does the performance of nodetool repair on each node increase > linearly, or is it quadratic ? The essence of this question is: If I have a > keyspace with x number of replicas in each data center, do I have to deal > with an upper limit on the number of data centers/nodes? > > > Thanks > > Vish >