We enabled the major repair on every node every 7 days.
I think you mean 2 cases of "failed" write. 
One is the replication failure of a writer. Duplication generated from this 
kind of "failed" should be very small in my case, because I only parse the data 
from 12 nodes, which should NOT contain any replication nodes.
If one node persistent a write, plus a "hint" of failed replication write, this 
write will still store as one write in its SSTable files, right? Why need to 
store 2 copies as duplication in SSTable files?
Another case is what you describe as client retries writing when time-out 
exception happens. This can explain the duplication reasonable.
Here is the duplication count happened in our SSTable files. You can see a lot 
of data duplicate 2 times, but also some with even higher number. But max 
duplication count is 27, can one client retry 27 times?
duplication_count duplication_occurrence







2 123615348
3 6446783
4 21102
5 1054
6 2496
7 47
8 726
9 52
10 12
11 3
12 7
13 9
14 7
15 3
16 2
17 2
18 1
19 5
20 5
22 1
23 3
25 2
27 99
Another question is do you have any guess what could cause case 2 happen in my 
original email?
Thanks
Date: Tue, 22 Oct 2013 17:52:24 -0700
Subject: Re: Questions related to the data in SSTable files
From: rc...@eventbrite.com
To: user@cassandra.apache.org

On Tue, Oct 22, 2013 at 5:17 PM, java8964 java8964 <java8...@hotmail.com> wrote:




Any way I can verify how often the system being "repaired"? I can ask another 
group who maintain the Cassandra cluster. But do you mean that even the failed 
writes will be stored in the SSTable files? 

"repair" sessions are logged in system.log, and the "best practice" is to run a 
repair once every gc_grace_seconds, which defaults to 10 days.

A "failed" write means only that it "failed" to meet its ConsistencyLevel in 
the request_timeout. It does not mean that it failed to write everywhere it 
tried to write. There is no rollback, so in practice with RF>1 it is likely 
that a "failed" write succeeded at least somewhere. But if any failure is 
noted, Cassandra will generate a hint for hinted handoff and attempt to 
redeliver the "failed" write. Also, many/most client applications will respond 
to a timedoutexception by attempting to re-write the "failed" write, using the 
same client timestamp.

Repair has a fixed granularity, so the larger the size of your dataset the more 
"over-repair" any given "repair" will cause.
Duplicates occur as a natural consequences of this, if you have 1 row which 
differs in the merkle tree chunk and the merkle tree chunk is, for example, 
1000 rows.. you will "repair" one row and "duplicate" the other 999.
 
=Rob                                      

Reply via email to