Re: Why data tripled in size after repair?

Sylvain Lebresne Thu, 27 Sep 2012 09:52:44 -0700

> I don't understand why it copied data twice. In worst case scenario it
> should copy everything (~90G)


Sadly no, repair is currently peer-to-peer based (there is a ticket to
fix it: https://issues.apache.org/jira/browse/CASSANDRA-3200, but
that's not trivial). This mean that you can end up with RF times the
data after a repair. Obviously that should be a worst case scenario as
it implies everything is repaired, but at least the triplicate part is
a problem, but a know and not so easy to fix one.

Is it possible that each time you've ran repair, one of the node in
the cluster was very out of sync with the other nodes. Maybe a node
that has crashed for a long time?

--
Sylvain

Re: Why data tripled in size after repair?

Reply via email to