Hi Vish,

1. This tool repairs inconsistencies across replicas of the row. Since
> latest update always wins, I dont see inconsistencies other than ones
> resulting from the combination of deletes, tombstones, and crashed nodes.
> Technically, if data is never deleted from cassandra, then nodetool repair
> does not need to be run at all. Is this understanding correct? If wrong,
> can anyone provide other ways inconsistencies could occur?
>

Even if you never delete data you should run repairs occasionally to ensure
overall consistency. While hinted handoffs and read repairs do lead to
better consistency, they are only helpers/optimization and are not regarded
as operations that ensure consistency.

2. Want to understand the performance of 'nodetool repair' in a Cassandra
> multi data center setup. As we add nodes to the cluster in various data
> centers, does the performance of nodetool repair on each node increase
> linearly, or is it quadratic ?
>

Its difficult to calculate the performance of a repair, I've seen the time
to completion fluctuate between 4hrs to 10hrs+ on the same node. However in
theory adding more nodes would spread the data and free up machine
resources, thus resulting in more performant repairs.

The essence of this question is: If I have a keyspace with x number of
> replicas in each data center, do I have to deal with an upper limit on the
> number of data centers/nodes?


Could you expand on why you believe there would be an upper limit of
dc/nodes due to running repairs?


Mark


On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran <
vish.ramachand...@gmail.com> wrote:

> Some questions on nodetool repair.
>
> 1. This tool repairs inconsistencies across replicas of the row. Since
> latest update always wins, I dont see inconsistencies other than ones
> resulting from the combination of deletes, tombstones, and crashed nodes.
> Technically, if data is never deleted from cassandra, then nodetool repair
> does not need to be run at all. Is this understanding correct? If wrong,
> can anyone provide other ways inconsistencies could occur?
>
> 2. Want to understand the performance of 'nodetool repair' in a Cassandra
> multi data center setup. As we add nodes to the cluster in various data
> centers, does the performance of nodetool repair on each node increase
> linearly, or is it quadratic ? The essence of this question is: If I have a
> keyspace with x number of replicas in each data center, do I have to deal
> with an upper limit on the number of data centers/nodes?
>
>
> Thanks
>
> Vish
>

Reply via email to