Thanks Andrey. Also found this ticket regarding this issue: https://issues.apache.org/jira/browse/CASSANDRA-2698
On Tue, Oct 16, 2012 at 8:00 PM, Andrey Ilinykh <ailin...@gmail.com> wrote: >> In my experience running repair on some counter data, the size of >> streamed data is much bigger than the cluster could possibly have lost >> messages or would be due to snapshotting at different times. >> >> I know the data will eventually be in sync on every repair, but I'm >> more interested in whether Cassandra transfers excess data and how to >> minimize this. >> >> Does any body have insights on this? >> > The problem is in granularity of Merkle tree. Cassandra sends regions > which have different hash values. It could be much bigger then a > single row. > > Andrey