from ticket 2818:
"One (reasonably simple) proposition to fix this would be to have repair
schedule validation compactions across nodes one by one (ie, one CF/range
at a time), waiting for all nodes to return their tree before submitting
the next request. Then on each node, we should make sure that the node will
start the validation compaction as soon as requested. For that, we probably
want to have a specific executor for validation compaction"
.. This was the way I thought repair worked.
Anyway, in our case, we only have one CF, so I'm not sure if both issues
apply to my situation.
Thanks. Looking forward to the release where these 2 things are fixed.
On , Jonathan Ellis <jbel...@gmail.com> wrote:
On Thu, Jul 21, 2011 at 9:14 AM, Jonathan Colby
jonathan.co...@gmail.com> wrote:
> I regularly run repair on my cassandra cluster. However, I often seen
that during the repair operation very large amounts of data are
transferred to other nodes.
https://issues.apache.org/jira/browse/CASSANDRA-2280
https://issues.apache.org/jira/browse/CASSANDRA-2816
> My questions is, if only some data is out of sync, why are entire Data
files being transferred?
Repair streams ranges of files as a unit (which becomes a new file on
the target node) rather than using the normal write path.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com