On Wed, Oct 23, 2013 at 5:23 AM, java8964 java8964 <java8...@hotmail.com>wrote:
> We enabled the major repair on every node every 7 days. > This is almost certainly the cause of your many duplicates. If you don't DELETE heavily, consider changing gc_grace_seconds to 34 days and then doing a repair on the first of the month. > If one node persistent a write, plus a "hint" of failed replication write, > this write will still store as one write in its SSTable files, right? Why > need to store 2 copies as duplication in SSTable files? > Write destined for replica nodes A B C. Write comes into A. Write "fails" but actually succeeds in replicating to B. A writes it as a hint. B flushes its memtable. A then delivers hint to B, creating another copy of the identical write in a memtable. B then flushes this new memtable. There are now two copies of the same write on disk. > Here is the duplication count happened in our SSTable files. You can see a > lot of data duplicate 2 times, but also some with even higher number. But > max duplication count is 27, can one client retry 27 times? > This many duplicates are almost certainly a result of repair over-repairing. Re-read this chunk from my previous mail : > Repair has a fixed granularity, so the larger the size of your dataset the > more "over-repair" any given "repair" will cause. > > Duplicates occur as a natural consequences of this, if you have 1 row > which differs in the merkle tree chunk and the merkle tree chunk is, for > example, 1000 rows.. you will "repair" one row and "duplicate" the other > 999. > Question #2 from your original mail is also almost certainly a result of "over-repair." The "duplicate" chunks can be from any time. =Rob PS - What cassandra version?