Thanks Mark, Since we have replicas in each data center, addition of a new data center (and new replicas) has a performance implication on nodetool repair. I do understand that adding nodes without increasing number of replicas may improve repair performance, but in this case we are adding new data center and additional replicas which is an added overhead on nodetool repair. Hence the thinking that we may reach an upper limit which could be the point when the nodetool repair costs are way too high.
On Tue, Aug 12, 2014 at 2:59 PM, Mark Reddy <mark.re...@boxever.com> wrote: > Hi Vish, > > 1. This tool repairs inconsistencies across replicas of the row. Since >> latest update always wins, I dont see inconsistencies other than ones >> resulting from the combination of deletes, tombstones, and crashed nodes. >> Technically, if data is never deleted from cassandra, then nodetool repair >> does not need to be run at all. Is this understanding correct? If wrong, >> can anyone provide other ways inconsistencies could occur? >> > > Even if you never delete data you should run repairs occasionally to > ensure overall consistency. While hinted handoffs and read repairs do lead > to better consistency, they are only helpers/optimization and are not > regarded as operations that ensure consistency. > > 2. Want to understand the performance of 'nodetool repair' in a Cassandra >> multi data center setup. As we add nodes to the cluster in various data >> centers, does the performance of nodetool repair on each node increase >> linearly, or is it quadratic ? >> > > Its difficult to calculate the performance of a repair, I've seen the time > to completion fluctuate between 4hrs to 10hrs+ on the same node. However in > theory adding more nodes would spread the data and free up machine > resources, thus resulting in more performant repairs. > > The essence of this question is: If I have a keyspace with x number of >> replicas in each data center, do I have to deal with an upper limit on the >> number of data centers/nodes? > > > Could you expand on why you believe there would be an upper limit of > dc/nodes due to running repairs? > > > Mark > > > On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran < > vish.ramachand...@gmail.com> wrote: > >> Some questions on nodetool repair. >> >> 1. This tool repairs inconsistencies across replicas of the row. Since >> latest update always wins, I dont see inconsistencies other than ones >> resulting from the combination of deletes, tombstones, and crashed nodes. >> Technically, if data is never deleted from cassandra, then nodetool repair >> does not need to be run at all. Is this understanding correct? If wrong, >> can anyone provide other ways inconsistencies could occur? >> >> 2. Want to understand the performance of 'nodetool repair' in a Cassandra >> multi data center setup. As we add nodes to the cluster in various data >> centers, does the performance of nodetool repair on each node increase >> linearly, or is it quadratic ? The essence of this question is: If I have a >> keyspace with x number of replicas in each data center, do I have to deal >> with an upper limit on the number of data centers/nodes? >> >> >> Thanks >> >> Vish >> > >